DaanVanVugt / h5pickle

Wrapper for h5py with pickle capabilities
MIT License
25 stars 8 forks source link

Add support for reopening files closed by cache eviction #8

Open estebanag opened 5 years ago

estebanag commented 5 years ago

Hi @Exteris,

Thanks for the work on this project. I've forked your repo (see here) and I'm in the process of experimenting with a way to automatically reopen htpy.Files (closed by cache eviction) when a h5py.Dataset is requested. Also note that I've added a lock to synchronize access to the cache, as it may cause problems when using dask. The current code is just a quick test (that works) but would you be interested in having this functionality in a PR? Any ideas or preferences are welcome.

@mrocklin, any feedback from you is also more than welcome!

DaanVanVugt commented 5 years ago

Hello @estebanag,

Yes, definitely. The lock is a good addition too, indeed things could go wrong with many threads.

Regarding the automatic re-opening I'm not sure if the h5py.DataSet stays valid after a file has been closed and re-opened. I think the safe way is to always create a new h5py.File, which may then be taken from the cache if already open.

You could of course wrap the DataSet to do this in the background.

estebanag commented 5 years ago

Great. Regarding recreating the h5py.File and wrapping the Dataset, that's exactly the way I got it to work in the current version (you can see it here).

avh commented 3 years ago

There is a bug on line 149 of init.py. The value stored in the cache is not self.hsh but self. The comparison will always fail and therefore a closed file is never evicted from the cache. Could just do "del cache[self.hsh]" instead?

avh commented 3 years ago

Actually, closing a file that is shared is a bad idea. Maybe this bug was a good thing. Perhaps the cache should use weak references and only close the file when the last reference is removed.