Closed sezelt closed 3 years ago
Thanks for bringing this up! I can see how this would be a problem.
I think your approach is a good solution, and Ill adopt that in an upcoming ncempy
release. I don't like the idea that the file object is lost, but h5py
documentation explicitly mentions this solution at this link. So, going out of scope with so called 'weak' closing is a suggested way to deal with hdf5 memmaps.
This is a tricky problem, actually. Numpy just totally ignores the issue! See numpy documentation here.
Currently there is no API to close the underlying mmap. It is tricky to ensure the resource is actually closed, since it may be shared between different memmap instances.
Alternatively, a user could keep the fileEMD
object in scope by returning the file object with the memmap. Then when you are done with the data you can del
the file object and the dataset object. Or just let all of them go out of scope and everything is garbage collected at the same time hopefully closing the object properly. I don't think this is the best option in py4DSTEM
since you just want to return data
by itself.
This is an example of what I suggested above in case anyone runs into a similar issue.
from ncempy.io.emd import fileEMD
fpath = "Sample_LFP_v0,12.h5"
def load_memmap(fpath, N):
f = fileEMD(fpath)
return f, *f.get_memmap(N)
f, data, dims = load_memmap(fpath,0)
print(data)
# <HDF5 dataset "data": shape (10, 10), type "<f8">
del f, data, dims
An issue we've encountered in py4DSTEM is that if you load a memory map to an EMD dataset using
ncempy.io.emd.fileEMD.get_memmap(N)
, the handle to the HDF5 file is forcibly closed when thefileEMD
object gets garbage collected.If a
fileEMD
object gets created inside a function (and is not retained anywhere else), when that function exists thefileEMD
will be deleted, calling its__del__
method and closing the file handle: https://github.com/ercius/openNCEM/blob/ab0a40a1354cf7e258efd016de675d2844e7a721/ncempy/io/emd.py#L162-L168A minimal reproduction of the problem is like so:
In py4DSTEM we use a context manager for most reading, so my approach when a memory map is asked for is to create a new
h5py.File
outside of the context manager. While thisFile
will get garbage collected, this does not actually close the file handle because there is still a reference to the Dataset. A good question with my approach is whether the file handle would ever get closed; it's possible that when the Dataset is finally garbage collected then the file gets closed as well, but I haven't tested this.I am using
ncempy
version 1.8.1