ercius / openNCEM

A collection of packages and tools for electron microscopy data analysis supported by the National Center for Electron Microscopy facility of the Molecular Foundry
GNU General Public License v3.0
58 stars 28 forks source link

`BufferError` caused by eager closing of DM file handles #41

Closed sezelt closed 2 years ago

sezelt commented 2 years ago

In py4DSTEM, we use fileDM as a context manager and then use it to get a memory map to the 4D-STEM data.

Recently, we have been seeing this error (full trace pasted at the bottom): BufferError: cannot close exported pointers exist. It seems to be caused by fileDM.__exit__ forcing the file to close. Python keeps track of how many references to the data exist, and raises this error when we try to close a file while the py4DSTEM.io.DataCube object still retains a reference to the file via its memory map.

I'm not sure why this has become a problem just now, since this part of ncempy hasn't changed in a while, so I suspect that it is restricted to certain versions of Python or numpy. But it is definitely occurring for some people running fully up-to-date installs of py4DSTEM and ncempy.

Perhaps the solution is just to put the self.fid.close() in a try-except, so that if there are no references remaining and it's safe to close the file, it still gets closed: https://github.com/ercius/openNCEM/blob/0dcbead63fb0e529967ee4c6b48280a20fcf7a1b/ncempy/io/dm.py#L227-L234 Though I'm not 100% sure what state the file handle gets left in, if it hits this error.


Full error:

---------------------------------------------------------------------------
BufferError                               Traceback (most recent call last)
Input In [15], in <cell line: 1>()
----> 1 dc = scan.get_data()

Input In [6], in Scan_Info_Storage.get_data(self, shift_correct)
     27 def get_data(self, shift_correct = False): 
---> 28     dc = py4DSTEM.io.read(self.data_filepath)
     29     dc.set_scan_shape(self.R_Nx,self.R_Ny)
     30     if shift_correct:

File ~/miniconda3/envs/cuda-multcorr-sez/lib/python3.8/site-packages/py4DSTEM/io/read.py:99, in read(fp, mem, binfactor, ft, metadata, **kwargs)
     95     data = read_py4DSTEM(
     96         fp, mem=mem, binfactor=binfactor, metadata=metadata, **kwargs
     97     )
     98 elif ft == "dm":
---> 99     data = read_dm(fp, mem, binfactor, metadata=metadata, **kwargs)
    100 elif ft == "empad":
    101     data = read_empad(fp, mem, binfactor, metadata=metadata, **kwargs)

File ~/miniconda3/envs/cuda-multcorr-sez/lib/python3.8/site-packages/py4DSTEM/io/nonnative/read_dm.py:53, in read_dm(fp, mem, binfactor, metadata, **kwargs)
     51                 dataSet = dmFile.getDataset(i)
     52             i += 1
---> 53         dc = DataCube(data=dataSet["data"])
     54 elif (mem, binfactor) == ("MEMMAP", 1):
     55     with dm.fileDM(fp, on_memory=False) as dmFile:
     56         # loop through the datasets until a >2D one is found:

File ~/miniconda3/envs/cuda-multcorr-sez/lib/python3.8/site-packages/ncempy/io/dm.py:246, in fileDM.__exit__(self, exception_type, exception_value, traceback)
    242 def __exit__(self, exception_type, exception_value, traceback):
    243     """Implement python's with statement
    244     and close the file via __del__()
    245     """
--> 246     self.__del__()
    247     return None

File ~/miniconda3/envs/cuda-multcorr-sez/lib/python3.8/site-packages/ncempy/io/dm.py:234, in fileDM.__del__(self)
    232 if self._v:
    233     print('Closing input file: {}'.format(self.file_path))
--> 234 self.fid.close()

BufferError: cannot close exported pointers exist
sezelt commented 2 years ago

This is the same error as #39, but in our use case we want to be able to hold on to the memory map even outside of the context so I think the solution will be different than how #40 handles it.

ercius commented 2 years ago

I think we ran into a similar issue previously and updated ncempy to work with py4DSTEM. I think that merging #40 is a good idea, and we should come up with a better way to interface with py4DSTEM since what you are doing is non-standard (I think). Maybe you should sub-class fileDM in py4DSTEM and overwrite the file closing code. Then you can handle the final file closing on your end. Im open to other suggestions as well.

sezelt commented 2 years ago

OK after thinking about this some more and investigating more deeply, it seems that I was wrong about which memory map was causing the issues. The memory map returned by getMemmap is created with something like np.memmap(self.file_path), which can safely escape the context manager. The error I was seeing was in fact exactly the same as #39, related to self.fid, which is a Python mmap. I've been able to reproduce the bug on my machines, and confirmed that #40 fixes the problem.