The new mesoscope TIFF splitting code merged in #464 (ticket #460) appears to be a factor of ~ 2 slower than the legacy code. This is probably not a show-stopper, but it would be nice to speed it up.
The likeliest way to speed it up would be to refactor the TimeSeriesSplitter class defined here
so that, instead of generating the HDF5 file for one (ROI, z) pair at a time, it can just loop through all of the (ROI, z) pairs associated with the ophys_session, writing all of the expected videos at once. The advantage is that, instead of having to loop through the time series TIFF file once for each video (i.e. for each call of write_output_file), it could loop through the TIFF file once, writing frames to the appropriate video files.
Tasks
[ ] Replace TimeSeriesSplitter.write_output_file with a method that accepts a dict mapping (i_roi, z) pairs to output file paths and then loops through the TIFF file once, writing frames from the TIFF to the appropriate output paths).
Validation
[ ] Make sure unit tests still pass (the unittests in tests/modules/mesoscope_splitting_cli simulate different real world use cases and check the actual output against expected output, so if those tests pass, everything should still work).
[ ] Verify that refactoring TimeSeriesSplitter.write_output_file to loop through the time series TIFF once for the entire ophys_session would, in fact, be faster than the current implementation. @danielsf has some real datasets that can be used for this test. If the code is not faster, it is probably not worth merging this ticket.
The new mesoscope TIFF splitting code merged in #464 (ticket #460) appears to be a factor of ~ 2 slower than the legacy code. This is probably not a show-stopper, but it would be nice to speed it up.
The likeliest way to speed it up would be to refactor the
TimeSeriesSplitter
class defined herehttps://github.com/AllenInstitute/ophys_etl_pipelines/blob/main/src/ophys_etl/modules/mesoscope_splitting/tiff_splitter.py#L372
so that, instead of generating the HDF5 file for one (ROI, z) pair at a time, it can just loop through all of the (ROI, z) pairs associated with the ophys_session, writing all of the expected videos at once. The advantage is that, instead of having to loop through the time series TIFF file once for each video (i.e. for each call of
write_output_file
), it could loop through the TIFF file once, writing frames to the appropriate video files.Tasks
TimeSeriesSplitter.write_output_file
with a method that accepts a dict mapping(i_roi, z)
pairs to output file paths and then loops through the TIFF file once, writing frames from the TIFF to the appropriate output paths).Validation
tests/modules/mesoscope_splitting_cli
simulate different real world use cases and check the actual output against expected output, so if those tests pass, everything should still work).TimeSeriesSplitter.write_output_file
to loop through the time series TIFF once for the entire ophys_session would, in fact, be faster than the current implementation. @danielsf has some real datasets that can be used for this test. If the code is not faster, it is probably not worth merging this ticket.