weiglszonja commented 2 years ago

Motivation

ImagingExtractorDataChunkIterator is a new data chunk iterator for ImagingExtractor objects that uses the GenericDataChunkIterator class to enable buffering over multiple frames. Primarily it can be used for writing imaging data to an NWB File.

Notes / questions:

1. ImagingExtractorDataChunkIterator buffers over multiple frames, but cross-frame chunking for smaller image sizes is not possible until ImagingExtractors enable retrieving frames with selected pixel range (i.e. not only frame indices but also pixel by pixel indices) -- until then, even if ImagingExtractorDataChunkIterator enables selection for all dimensions (not only the first dimension) ImagingExtractor.get_video() would anyhow return the full image size.

2. Evaluate performance improvement over DataChunkIterator TODO: timeit to check performance improvement over the previous implementation

3. GenericDataChunkIterator asserts that chunk_shape evenly divides the buffer_shape. ImagingExtractorDataChunkIterator uses default chunk_shape retrieved from the parent class, which might not evenly divide the buffer_shape that was calculated here.

Resolve #518

How to test the behavior?

TODO: add more unittests

test_imaging_extractor_data_chunk_iterator()

Checklist

[x] Have you checked our Contributing document?
[x] Have you ensured the PR description clearly describes the problem and solutions?
[x] Have you checked to ensure that there aren't other open or previously closed Pull Requests for the same change?
[x] If this PR fixes an issue, is the first line of the PR description fix #XXX where XXX is the issue number?

CodyCBakerPhD commented 2 years ago

ImagingExtractorDataChunkIterator buffers over multiple frames, but cross-frame chunking for smaller image sizes is not possible until ImagingExtractors enable retrieving frames with selected pixel range (i.e. not only frame indices but also pixel by pixel indices) -- until then, even if ImagingExtractorDataChunkIterator enables selection for all dimensions (not only the first dimension) ImagingExtractor.get_video() would anyhow return the full image size.

You should be able to chunk over any sub-pixel region as long as the number of frames in that chunk evenly divides the number of frames in the buffer (though it's typical to figure out the chunk shape first, and then multiply that).

If you meant buffering over sub-pixel regions, yeah that's not supported until ROIExtractors supports it, but this iterator is primarily meant for writing the data to a TwoPhotonSeries in an NWBFile, so we do want to extract all the data.

GenericDataChunkIterator asserts that chunk_shape evenly divides the buffer_shape. ImagingExtractorDataChunkIterator uses default chunk_shape retrieved from the parent class, which might not evenly divide the buffer_shape that was calculated here.

Let me know what exact error you get there for what exact combination of shapes and we can try to debug this.

CodyCBakerPhD commented 2 years ago

@weiglszonja It's also worth taking a look at how the MovieDataChunkIterator resolved these issues, since the API is quite similar between the two: https://github.com/catalystneuro/nwb-conversion-tools/blob/main/src/nwb_conversion_tools/datainterfaces/behavior/movie/movie_utils.py#L145

CodyCBakerPhD commented 2 years ago

GenericDataChunkIterator asserts that chunk_shape evenly divides the buffer_shape. ImagingExtractorDataChunkIterator uses default chunk_shape retrieved from the parent class, which might not evenly divide the buffer_shape that was calculated here.

I think I might know what you're referring to, but that's also why the logic here (https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/data_utils.py#L233-L236) is only evaluated over the subset of axes for which the buffer shape is not clamped to the maxshape, so we should be fine.

The only axis you'll have to ensure is a multiple is the frames one, which should be simple enough to resolve

CodyCBakerPhD commented 2 years ago

As derived in our meeting earlier

weiglszonja commented 2 years ago

The tests are currently failing because I have hdmf==3.2.1 (listed in requirements-full.txt) while the test dependencies are installed from requirements-minimal.txt which permits having newer versions of hdmf (hdmf>=3.2.1).

This assertion error message is different in the newer version (hdmf==3.3.2): https://github.com/hdmf-dev/hdmf/blob/0e50852145bbf9a4676b6edc936317f882f9a47e/src/hdmf/data_utils.py#L232

@CodyCBakerPhD, should I make the changes based on the newer version of hdmf while also updating the package version in requirements-full.txt to match what is in the minimal or is it harmful?

CodyCBakerPhD commented 2 years ago

@weiglszonja hdmf version is bumped in https://github.com/catalystneuro/nwb-conversion-tools/pull/592 as well, you can mirror it here in requirements-full no problem.

For testing, what you can do is @pytest.mark.skipIf(logical_condition) that particular assertion test whenever the detected current version of hdmf is <3.3.2. Let me know if you have questions on how to do that. I'd also add a test likewise to be run only if >=3.3.2 that uses the new assertion message.

weiglszonja commented 2 years ago

ah okay, I'll try it, thanks @CodyCBakerPhD!

CodyCBakerPhD commented 2 years ago

@weiglszonja Gonna need the alternative route for grabbing package versions with PyVersion <= 3.7 - https://github.com/NeurodataWithoutBorders/nwbinspector/blob/eb8a63b15f2b574de621bf4a7a932e3b00a6880a/nwbinspector/utils.py#L93-L115

weiglszonja commented 2 years ago

@CodyCBakerPhD Yeah you're right, thank you. Can I import this method from nwbinspector or should I create one in CT?

CodyCBakerPhD commented 2 years ago

@CodyCBakerPhD Yeah you're right, thank you. Can I import this method from nwbinspector or should I create one in CT?

I have no problem with adding nwbinspector, as it is an intended companion tool for producing any and all NWBFiles.

CodyCBakerPhD commented 2 years ago

@weiglszonja Oh, one more thing - can you move this class from tools/hdmf to tools/roiextractors/ to mirror the SpikeInterface iterator (https://github.com/catalystneuro/nwb-conversion-tools/tree/main/src/nwb_conversion_tools/tools/spikeinterface)?

weiglszonja commented 2 years ago

@CodyCBakerPhD I moved the class to tools/roiextractors, and also moved the tests to tests/test_internals/test_imagingextractordatachunkiterator.py

CodyCBakerPhD commented 2 years ago

@CodyCBakerPhD I moved the class to tools/roiextractors, and also moved the tests to tests/test_internals/test_imagingextractordatachunkiterator.py

Great! Looks like it's really coming together - now we just need those performance comparisons when the pixel selection of the buffer shape is equal to the maximum pixel shape

codecov[bot] commented 2 years ago

Codecov Report

Merging #575 (a49dfc5) into main (c297c3f) will increase coverage by 0.14%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #575      +/-   ##
==========================================
+ Coverage   88.34%   88.48%   +0.14%     
==========================================
  Files          59       60       +1     
  Lines        3218     3258      +40     
==========================================
+ Hits         2843     2883      +40     
  Misses        375      375

Flag	Coverage Δ
unittests	`88.48% <100.00%> (+0.14%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...roiextractors/imagingextractordatachunkiterator.py	`100.00% <100.00%> (ø)`

h-mayorquin commented 2 years ago

Should we close this @weiglszonja ?

weiglszonja commented 2 years ago

Closed as it was merged in https://github.com/catalystneuro/neuroconv/pull/54

catalystneuro / nwb-conversion-tools

Add ImagingExtractorDataChunkIterator #575

Motivation

How to test the behavior?

Checklist

Codecov Report