Multifile scanimage - Githubissues

pauladkisson commented 8 months ago

Fixes #296

weiglszonja commented 8 months ago

This looks great @pauladkisson, thank you! I can confirm it works with the data from Dombeck; I'm just wondering now whether it is possible to speed up the initialisation of the "parent" extractor. We're parsing the metadata for each file when initialising the individual extractors, but they're all time same except for these keys:

Different values for common keys: [('frameTimestamps_sec', '0.000000000', '133.152635375'), ('frameNumbers', '1', '4001'), ('frameNumberAcquisition', '1', '4001')]

I think frameTimestamps_sec is parsed also in extract_timestamps_from_file() so we don't lose this if we just do

extract_extra_metadata(file_path)
parse_metadata(self.metadata)

for the first TIF file, but let me know what you think about this!

CodyCBakerPhD commented 8 months ago

Can we add some tests, even if they're just manually formed out of symlink copies of the current single file ScanImageTiff files we currently have? To showcase and verify the naming pattern we expect from this multifile procedure generated by the ScanImage software (in particular, the naming pattern of the files providing the ordering of sequences is something we expect to follow a stringent pattern, presumably compatible with natsort)

CodyCBakerPhD commented 8 months ago

And yes, consideration of expedited metadata parsing should be considered for speed. For a very long imaging session using this strategy, parsing metadata for each subfile can be expensive

pauladkisson commented 8 months ago

Added tests and option to skip metadata i/o for multifile extractors. @weiglszonja lmk what you think!

weiglszonja commented 7 months ago

Sorry for the delay @pauladkisson, is this example data dual or single channel? Can we add test for dual channel multifile as well?

I checked with the data I have from the Dombeck lab (see this note):

from roiextractors.extractors.tiffimagingextractors.scanimagetiffimagingextractor import \
    ScanImageTiffSinglePlaneMultiFileImagingExtractor

extractor1 = ScanImageTiffSinglePlaneMultiFileImagingExtractor(folder_path="/Volumes/LaCie/CN_GCP/Dombeck/2620749R2_231211", extract_all_metadata=False, file_pattern="2620749R2_231211_00001*.tif", channel_name="Channel 1", plane_name="0")
video1 = extractor1.get_video(start_frame=0, end_frame=100)

extractor2 = ScanImageTiffSinglePlaneMultiFileImagingExtractor(folder_path="/Volumes/LaCie/CN_GCP/Dombeck/2620749R2_231211", extract_all_metadata=False, file_pattern="2620749R2_231211_00001*.tif", channel_name="Channel 2", plane_name="0")
video2 = extractor1.get_video(start_frame=0, end_frame=100)

from numpy.testing import assert_array_equal

# This should not be True 
assert_array_equal(video1, video2)

However looks like the same video is returned for both channels, also looking at these plots:

However when I'm using ScanImageTiffSinglePlaneImagingExtractor instead, the frames look as expected:

pauladkisson commented 7 months ago

Hey @weiglszonja, thanks for taking a look. It shouldn't matter whether the example data is single channel or multi channel, since ScanImageTiffSinglePlaneImagingExtractor takes care of all of the indexing logic. What would matter is if these files are split midcycle, which is typical of the file name pattern prefix_00001_00001.tif. This is an issue I just discovered from Lawrence Niu -- see #299. Do you know if the Dombeck files are split midcycle? Better yet, can you provide me two or three of these files via Google Drive, so that I can take a look at them myself?

pauladkisson commented 7 months ago

Hey @weiglszonja, looks like there is a typo in your example code. You have

...
video2 = extractor1.get_video(start_frame=0, end_frame=100)
...

But it should be

...
video2 = extractor2.get_video(start_frame=0, end_frame=100)
...

I'm not seeing any problems with the multifile imaging extractors on my end:

file_path = '/Volumes/T7/CatalystNeuro/NWB/Dombeck/raw_data/2620749R2_231211_00001_00001.tif'
imaging_extractor = ScanImageTiffSinglePlaneImagingExtractor(
    file_path=file_path,
    channel_name='Channel 1',
    plane_name='0',
)
video_ch1_single = imaging_extractor.get_video(end_frame=10)
imaging_extractor = ScanImageTiffSinglePlaneImagingExtractor(
    file_path=file_path,
    channel_name='Channel 2',
    plane_name='0',
)
video_ch2_single = imaging_extractor.get_video(end_frame=10)

imaging_extractor = ScanImageTiffSinglePlaneMultiFileImagingExtractor(
    folder_path='/Volumes/T7/CatalystNeuro/NWB/Dombeck/raw_data',
    file_pattern='2620749R2_231211_00001_*.tif',
    channel_name='Channel 1',
    plane_name='0',
)
video_ch1_multi = imaging_extractor.get_video(end_frame=10)

imaging_extractor = ScanImageTiffSinglePlaneMultiFileImagingExtractor(
    folder_path='/Volumes/T7/CatalystNeuro/NWB/Dombeck/raw_data',
    file_pattern='2620749R2_231211_00001_*.tif',
    channel_name='Channel 2',
    plane_name='0',
)
video_ch2_multi = imaging_extractor.get_video(end_frame=10)

print(f'ch1 multi == ch1 single: {np.array_equal(video_ch1_multi, video_ch1_single)}') # True
print(f'ch2 multi == ch2 single: {np.array_equal(video_ch2_multi, video_ch2_single)}') # True
print(f'ch1 single == ch2 single: {np.array_equal(video_ch1_single, video_ch2_single)}') # False
print(f'ch1 multi == ch2 multi: {np.array_equal(video_ch1_multi, video_ch2_multi)}') # False

weiglszonja commented 7 months ago

Hey @weiglszonja, looks like there is a typo in your example code. You have

Ah, my bad then. Sorry @pauladkisson, and thank you for checking.

codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 79.59%. Comparing base (86126dd) to head (39779da).

Additional details and impacted files

[![Impacted file tree graph](https://app.codecov.io/gh/catalystneuro/roiextractors/pull/297/graphs/tree.svg?width=650&height=150&src=pr&token=UA958XVSZP&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalystneuro)](https://app.codecov.io/gh/catalystneuro/roiextractors/pull/297?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalystneuro) ```diff @@ Coverage Diff @@ ## main #297 +/- ## ========================================== + Coverage 79.25% 79.59% +0.34% ========================================== Files 39 39 Lines 3100 3147 +47 ========================================== + Hits 2457 2505 +48 + Misses 643 642 -1 ``` | [Flag](https://app.codecov.io/gh/catalystneuro/roiextractors/pull/297/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalystneuro) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/catalystneuro/roiextractors/pull/297/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalystneuro) | `79.59% <100.00%> (+0.34%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalystneuro#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/catalystneuro/roiextractors/pull/297?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalystneuro) | Coverage Δ | | |---|---|---| | [src/roiextractors/extractorlist.py](https://app.codecov.io/gh/catalystneuro/roiextractors/pull/297?src=pr&el=tree&filepath=src%2Froiextractors%2Fextractorlist.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalystneuro#diff-c3JjL3JvaWV4dHJhY3RvcnMvZXh0cmFjdG9ybGlzdC5weQ==) | `100.00% <ø> (ø)` | | | [...ctors/extractors/tiffimagingextractors/\_\_init\_\_.py](https://app.codecov.io/gh/catalystneuro/roiextractors/pull/297?src=pr&el=tree&filepath=src%2Froiextractors%2Fextractors%2Ftiffimagingextractors%2F__init__.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalystneuro#diff-c3JjL3JvaWV4dHJhY3RvcnMvZXh0cmFjdG9ycy90aWZmaW1hZ2luZ2V4dHJhY3RvcnMvX19pbml0X18ucHk=) | `100.00% <ø> (ø)` | | | [...imagingextractors/scanimagetiffimagingextractor.py](https://app.codecov.io/gh/catalystneuro/roiextractors/pull/297?src=pr&el=tree&filepath=src%2Froiextractors%2Fextractors%2Ftiffimagingextractors%2Fscanimagetiffimagingextractor.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=catalystneuro#diff-c3JjL3JvaWV4dHJhY3RvcnMvZXh0cmFjdG9ycy90aWZmaW1hZ2luZ2V4dHJhY3RvcnMvc2NhbmltYWdldGlmZmltYWdpbmdleHRyYWN0b3IucHk=) | `99.17% <100.00%> (+0.70%)` | :arrow_up: |

catalystneuro / roiextractors

Multifile scanimage #297

Codecov Report