flatironinstitute / NoRMCorre

Matlab routines for online non-rigid motion correction of calcium imaging data
GNU General Public License v2.0
142 stars 88 forks source link

Unnecessary reading of TIFF IFDs? #23

Closed ifittakesallnight closed 5 years ago

ifittakesallnight commented 5 years ago

While registering each image to the template, normcorre_batch.m calls read_file.m to read in a number of frames equal to bin_width, which defaults to 10 and which we frequently set to be much larger. Every time read_file.m is called (at least for TIFFs) it must first read all the IFDs, which can take around 1 minute each time for large TIFFs. If we've read the code correctly, this means for a 30,000-frame movie with bin_width set to 300, the IFDs are read 100X... normcorre would spend 100 minutes on unnecessary file I/O. It might be good to revise the related code to prevent these re-reads.

Thanks.

epnev commented 5 years ago

@ifittakesallnight The default the value for bin_width is actually 50 (even through the comments say 10). But I changed it to 200 since I agree with you that higher values can be used.

However, I'm not sure I agree with your timings there. When you're using the parallel version (and you're not loading the file in memory) each process will have to read the file independently and this happens every bin_width frames. However, some info about the file is only read just once (check e.g., line 24 in normcorre_batch) and then passed to all the processes. Each process will have to open the TIFF and read from it using that info and the majority of the time is actually used for reading the data using Matlab's internal TIFF library.

I did a small example with a 3000x512x512 file turning off parallelization and using bin_width=100. The read_file function was called 32 times for a total of 17 seconds and out of these 17s, 15s were spent on reading the actual data (see attached screenshot).

screen shot 2019-02-01 at 4 24 26 pm

If you think that a lot of time is spent in re-reading IFDs you can run the profiler (with parallelization turned off) and we could possibly get a better understanding.

ifittakesallnight commented 5 years ago

We'll check the timings and update you.

epnev commented 5 years ago

@ifittakesallnight One thing that might be happening is that your TIFFs are saved as a single page TIF as opposed to multi page. If this is happening, then you cannot index individual frames and every time you have to access a frame you more or less need to read the whole file. That could lead to the long reading times that you reported. ImageJ/FIJI save TIFFs that are larger than 4GB as a single page TIF so if your files are big, you should make sure that this is not happening.