Open estebanag opened 5 years ago
Hello @estebanag,
Yes, definitely. The lock is a good addition too, indeed things could go wrong with many threads.
Regarding the automatic re-opening I'm not sure if the h5py.DataSet
stays valid after a file has been closed and re-opened. I think the safe way is to always create a new h5py.File
, which may then be taken from the cache if already open.
You could of course wrap the DataSet
to do this in the background.
Great. Regarding recreating the h5py.File
and wrapping the Dataset
, that's exactly the way I got it to work in the current version (you can see it here).
There is a bug on line 149 of init.py. The value stored in the cache is not self.hsh but self. The comparison will always fail and therefore a closed file is never evicted from the cache. Could just do "del cache[self.hsh]" instead?
Actually, closing a file that is shared is a bad idea. Maybe this bug was a good thing. Perhaps the cache should use weak references and only close the file when the last reference is removed.
Hi @Exteris,
Thanks for the work on this project. I've forked your repo (see here) and I'm in the process of experimenting with a way to automatically reopen
htpy.File
s (closed by cache eviction) when ah5py.Dataset
is requested. Also note that I've added a lock to synchronize access to the cache, as it may cause problems when using dask. The current code is just a quick test (that works) but would you be interested in having this functionality in a PR? Any ideas or preferences are welcome.@mrocklin, any feedback from you is also more than welcome!