catalystneuro / roiextractors

Python-based module for extracting from, converting between, and handling optical imaging data from several file formats. Inspired by SpikeInterface.
https://roiextractors.readthedocs.io/en/latest/index.html
BSD 3-Clause "New" or "Revised" License
12 stars 7 forks source link

using contexts #28

Open bendichter opened 4 years ago

bendichter commented 4 years ago

There are now at least four file API types (HDF5, NWB and Bio-Formats, CaImAn) that require the file to be open to read. This would be best handled by contexts. In fact, in each of these cases, the reader we use supports reading with a context. We could also provide this capability by creating __enter__ and __exit__ methods that call the __enter__ and __exit__ methods of those readers.

h-mayorquin commented 2 years ago

I would like to take over this to get practice with contex managers after I am done with the current conversion if that's fine.

bendichter commented 2 years ago

great! The idea would be to allow a syntax like:

with Hdf5ImagingExtractor("hdf5_file.h5") as extractor:
    first_frame = extractor.get_frames([0])

and then have the HDF5 file be closed after the context ends

CodyCBakerPhD commented 2 years ago

Maybe we should discuss briefly what the proper approach ought to be inside the design of each extractors retrieval/access functions.

My approach to the recent ScanImageTiff as seen here (https://github.com/catalystneuro/roiextractors/blob/master/src/roiextractors/extractors/tiffimagingextractors/scanimagetiffimagingextractor.py#L83-L84) is to

(i) use a context inside the __init__ to open -> read metadata -> close the file initially to get shape, rate, and other info (ii) then for data access in the get_frames and related functions, again in a context, re-open -> read requested data as array -> re-close and return array

Now, I do see how the current design of the NWB (https://github.com/catalystneuro/roiextractors/blob/master/src/roiextractors/extractors/nwbextractors/nwbextractors.py#L64) and HDF5 (https://github.com/catalystneuro/roiextractors/blob/master/src/roiextractors/extractors/hdf5imagingextractor/hdf5imagingextractor.py#L48) differ here, by leaving the io open from the __init__, which really implies something more like the usage @bendichter indicates in (https://github.com/catalystneuro/roiextractors/issues/28#issuecomment-1143689378).

I guess the questions is, which design approach should be preferred? Internally safe access at cost of repeated io open/close, or single open/close but with better context design?

bendichter commented 2 years ago

I'd rather enable contexts across the board.Maybe as a compromise,

def __enter__(cls):
    ...
    cls.__init__(..., leave_open=True)
h-mayorquin commented 2 years ago

I vote for single open and close with better context design. I think it is simpler.

CodyCBakerPhD commented 2 years ago

Just be aware that for the ScanImageTiff, data retrieval must be done the way it's constructed right now or else it starts to act screwy - apparently when you call .data(...) multiple times in the same open IO there, it gets confused about frame access and will quickly terminate in an error.

bendichter commented 2 years ago

@CodyCBakerPhD good to know!

pauladkisson commented 6 months ago

Checking back in on this issue, @h-mayorquin @bendichter @CodyCBakerPhD, do you think this is still a priority?

My preference is to keep the API as standard as possible, so since ScanImage can't really support this kind of context-based usage, maybe we should just keep to the standard of opening/closing in each get_video call. But, if using a context-based approach could give significantly better performance then it might be worth implementing.

CodyCBakerPhD commented 6 months ago

I don't think it was ever a priority, but still an open question since I don't think we reached consensus about the exact approach to take

Aso note that contextlib.contextmanager has vastly simplified the creation and management of contexts. See https://github.com/catalystneuro/neuroconv/blob/main/src/neuroconv/tools/nwb_helpers/_metadata_and_file_helpers.py#L130 and https://github.com/NeurodataWithoutBorders/nwb_benchmarks/blob/main/src/nwb_benchmarks/core/_network_tracker.py#L14

for examples

h-mayorquin commented 6 months ago

Yep, we could make everything a context. No need to have that available in the extraction libraries. But yes, the problem would be to standarize and all the work that it implies.