Closed ethanbb closed 7 months ago
So I realized that closing a file object passed to np.memmap doesn't actually help close the mmap stored by the np.memmap, as these objects have independent lifetimes. I think it just seemed like this was working because I was doing this for a read-only memmap and not trying to delete the file afterwards.
The only way to actually close the mmap without accessing a private property, as long as it isn't shared between multiple np.memmaps, seems to be just a del
statement. I switched to this and it seems to work.
I hope we're reliably getting copies from what was in the memmapped file rather than views into it; if there's any code path that has it remain as a view then the del won't remove the last reference to the memmap and the file will remain open.
Otherwise seems a sensible approach. In theory the Windows-style file-locking is doing is probably more correct/safe, but it's a pain in the butt to deal with. It can avoid some pretty subtle bugs that could come up.
Things are probably in a good place now; if you're happy with it and it passes tests, I can merge; let me know when you're ready
I'll probably follow this up with a diff that will replace use of the "N" variable with something else; single letter caps variables are probably not good names (although if you want to slip that adjustment into this diff before you give me the green light to merge, that'd be fine)
I hope we're reliably getting copies from what was in the memmapped file rather than views into it; if there's any code path that has it remain as a view then the del won't remove the last reference to the memmap and the file will remain open.
This is a good point. Actually, it's just as important if we stuck with the _mmap.close()
strategy, as doing so if _mmap
has more than one reference can crash Python.
Looking how the memmaps are used in _sbxread_helper
- out
should not present a problem, since it's only written to, not read. sbx_mmap
is currently definitely only copied, not viewed, since it's only every indexed using advanced indices produced by np.ix_
. But I added a comment to be careful not to change that in the future.
Also got rid of those N
s! Ready to merge now I believe.
Once it passes CI I'll merge; on the off chance we run into hiccups the next time conda-forge does builds for a package release I may need to either reach out to you or to do last-minute changes (if things are broken then); if I go with the latter I'll poke you.
Description
This PR addresses the potential problem even when loading just a portion of an .sbx file (except for a subset of frames), such as a single plane, the whole file would be loaded into memory first and then indexed. This could cause an out-of-memory error if the file is large enough, or at least slow things down. This is less of an issue when loading or converting in chunks, but again could slow down the operation.
The solution here is to use np.memmap instead of np.fromfile and index that, so only the desired parts of the file are loaded. Also, I carefully checked the operations on the loaded data to eliminate any extraneous copies (particularly in
np.invert
). Doing this also means that there's no disadvantage to using the same strategy when loading an arbitrary set of frames rather than a slice, so it simplifies the code. I confirmed usingtracemalloc
that the peak memory usage is only marginally higher than that needed to store the loaded part of the data and added a test for this.I also had as part of this branch bug fixes for Windows and older tifffile versions, but I realized when preparing the PR that similar changes had already been merged to dev. I kept the tifffile code from dev the same after checking that it works fine for version 2023.4.12. For the other issue, I saw that the fix was basically a workaround that avoids closing .tif files or reusing the same filename in the test suite. I had determined that the actual issue was not closing the .tif file handle within the memmap after finishing writing to it, and after fixing this, the original tests work fine on Windows, so I kept my version. However, it might make sense to have a little extra leniency for file locking even if this version should theoretically work.
Type of change
Has your PR been tested?
caimanmanager test
passes on Windows (Python 3.9) and Linux (Python 3.11).