Closed pauladkisson closed 7 months ago
This looks great @pauladkisson, thank you! I can confirm it works with the data from Dombeck; I'm just wondering now whether it is possible to speed up the initialisation of the "parent" extractor. We're parsing the metadata for each file when initialising the individual extractors, but they're all time same except for these keys:
Different values for common keys: [('frameTimestamps_sec', '0.000000000', '133.152635375'), ('frameNumbers', '1', '4001'), ('frameNumberAcquisition', '1', '4001')]
I think frameTimestamps_sec
is parsed also in extract_timestamps_from_file()
so we don't lose this if we just do
extract_extra_metadata(file_path)
parse_metadata(self.metadata)
for the first TIF file, but let me know what you think about this!
Can we add some tests, even if they're just manually formed out of symlink copies of the current single file ScanImageTiff files we currently have? To showcase and verify the naming pattern we expect from this multifile procedure generated by the ScanImage software (in particular, the naming pattern of the files providing the ordering of sequences is something we expect to follow a stringent pattern, presumably compatible with natsort)
And yes, consideration of expedited metadata parsing should be considered for speed. For a very long imaging session using this strategy, parsing metadata for each subfile can be expensive
Added tests and option to skip metadata i/o for multifile extractors. @weiglszonja lmk what you think!
Sorry for the delay @pauladkisson, is this example data dual or single channel? Can we add test for dual channel multifile as well?
I checked with the data I have from the Dombeck lab (see this note):
from roiextractors.extractors.tiffimagingextractors.scanimagetiffimagingextractor import \
ScanImageTiffSinglePlaneMultiFileImagingExtractor
extractor1 = ScanImageTiffSinglePlaneMultiFileImagingExtractor(folder_path="/Volumes/LaCie/CN_GCP/Dombeck/2620749R2_231211", extract_all_metadata=False, file_pattern="2620749R2_231211_00001*.tif", channel_name="Channel 1", plane_name="0")
video1 = extractor1.get_video(start_frame=0, end_frame=100)
extractor2 = ScanImageTiffSinglePlaneMultiFileImagingExtractor(folder_path="/Volumes/LaCie/CN_GCP/Dombeck/2620749R2_231211", extract_all_metadata=False, file_pattern="2620749R2_231211_00001*.tif", channel_name="Channel 2", plane_name="0")
video2 = extractor1.get_video(start_frame=0, end_frame=100)
from numpy.testing import assert_array_equal
# This should not be True
assert_array_equal(video1, video2)
However looks like the same video is returned for both channels, also looking at these plots:
However when I'm using ScanImageTiffSinglePlaneImagingExtractor
instead, the frames look as expected:
Hey @weiglszonja, thanks for taking a look. It shouldn't matter whether the example data is single channel or multi channel, since ScanImageTiffSinglePlaneImagingExtractor takes care of all of the indexing logic. What would matter is if these files are split midcycle, which is typical of the file name pattern prefix_00001_00001.tif
. This is an issue I just discovered from Lawrence Niu -- see #299. Do you know if the Dombeck files are split midcycle? Better yet, can you provide me two or three of these files via Google Drive, so that I can take a look at them myself?
Hey @weiglszonja, looks like there is a typo in your example code. You have
...
video2 = extractor1.get_video(start_frame=0, end_frame=100)
...
But it should be
...
video2 = extractor2.get_video(start_frame=0, end_frame=100)
...
I'm not seeing any problems with the multifile imaging extractors on my end:
file_path = '/Volumes/T7/CatalystNeuro/NWB/Dombeck/raw_data/2620749R2_231211_00001_00001.tif'
imaging_extractor = ScanImageTiffSinglePlaneImagingExtractor(
file_path=file_path,
channel_name='Channel 1',
plane_name='0',
)
video_ch1_single = imaging_extractor.get_video(end_frame=10)
imaging_extractor = ScanImageTiffSinglePlaneImagingExtractor(
file_path=file_path,
channel_name='Channel 2',
plane_name='0',
)
video_ch2_single = imaging_extractor.get_video(end_frame=10)
imaging_extractor = ScanImageTiffSinglePlaneMultiFileImagingExtractor(
folder_path='/Volumes/T7/CatalystNeuro/NWB/Dombeck/raw_data',
file_pattern='2620749R2_231211_00001_*.tif',
channel_name='Channel 1',
plane_name='0',
)
video_ch1_multi = imaging_extractor.get_video(end_frame=10)
imaging_extractor = ScanImageTiffSinglePlaneMultiFileImagingExtractor(
folder_path='/Volumes/T7/CatalystNeuro/NWB/Dombeck/raw_data',
file_pattern='2620749R2_231211_00001_*.tif',
channel_name='Channel 2',
plane_name='0',
)
video_ch2_multi = imaging_extractor.get_video(end_frame=10)
print(f'ch1 multi == ch1 single: {np.array_equal(video_ch1_multi, video_ch1_single)}') # True
print(f'ch2 multi == ch2 single: {np.array_equal(video_ch2_multi, video_ch2_single)}') # True
print(f'ch1 single == ch2 single: {np.array_equal(video_ch1_single, video_ch2_single)}') # False
print(f'ch1 multi == ch2 multi: {np.array_equal(video_ch1_multi, video_ch2_multi)}') # False
Hey @weiglszonja, looks like there is a typo in your example code. You have
Ah, my bad then. Sorry @pauladkisson, and thank you for checking.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 79.59%. Comparing base (
86126dd
) to head (39779da
).
Fixes #296