Open bendichter opened 4 years ago
I would like to take over this to get practice with contex managers after I am done with the current conversion if that's fine.
great! The idea would be to allow a syntax like:
with Hdf5ImagingExtractor("hdf5_file.h5") as extractor:
first_frame = extractor.get_frames([0])
and then have the HDF5 file be closed after the context ends
Maybe we should discuss briefly what the proper approach ought to be inside the design of each extractors retrieval/access functions.
My approach to the recent ScanImageTiff
as seen here (https://github.com/catalystneuro/roiextractors/blob/master/src/roiextractors/extractors/tiffimagingextractors/scanimagetiffimagingextractor.py#L83-L84) is to
(i) use a context inside the __init__
to open -> read metadata -> close the file initially to get shape, rate, and other info
(ii) then for data access in the get_frames
and related functions, again in a context, re-open -> read requested data as array -> re-close and return array
Now, I do see how the current design of the NWB (https://github.com/catalystneuro/roiextractors/blob/master/src/roiextractors/extractors/nwbextractors/nwbextractors.py#L64) and HDF5 (https://github.com/catalystneuro/roiextractors/blob/master/src/roiextractors/extractors/hdf5imagingextractor/hdf5imagingextractor.py#L48) differ here, by leaving the io open from the __init__
, which really implies something more like the usage @bendichter indicates in (https://github.com/catalystneuro/roiextractors/issues/28#issuecomment-1143689378).
I guess the questions is, which design approach should be preferred? Internally safe access at cost of repeated io open/close, or single open/close but with better context design?
I'd rather enable contexts across the board.Maybe as a compromise,
def __enter__(cls):
...
cls.__init__(..., leave_open=True)
I vote for single open and close with better context design. I think it is simpler.
Just be aware that for the ScanImageTiff
, data retrieval must be done the way it's constructed right now or else it starts to act screwy - apparently when you call .data(...)
multiple times in the same open IO there, it gets confused about frame access and will quickly terminate in an error.
@CodyCBakerPhD good to know!
Checking back in on this issue, @h-mayorquin @bendichter @CodyCBakerPhD, do you think this is still a priority?
My preference is to keep the API as standard as possible, so since ScanImage can't really support this kind of context-based usage, maybe we should just keep to the standard of opening/closing in each get_video
call. But, if using a context-based approach could give significantly better performance then it might be worth implementing.
I don't think it was ever a priority, but still an open question since I don't think we reached consensus about the exact approach to take
Aso note that contextlib.contextmanager
has vastly simplified the creation and management of contexts. See https://github.com/catalystneuro/neuroconv/blob/main/src/neuroconv/tools/nwb_helpers/_metadata_and_file_helpers.py#L130 and https://github.com/NeurodataWithoutBorders/nwb_benchmarks/blob/main/src/nwb_benchmarks/core/_network_tracker.py#L14
for examples
Yep, we could make everything a context. No need to have that available in the extraction libraries. But yes, the problem would be to standarize and all the work that it implies.
There are now at least four file API types (HDF5, NWB and Bio-Formats, CaImAn) that require the file to be open to read. This would be best handled by contexts. In fact, in each of these cases, the reader we use supports reading with a context. We could also provide this capability by creating
__enter__
and__exit__
methods that call the__enter__
and__exit__
methods of those readers.