Closed ifittakesallnight closed 5 years ago
@ifittakesallnight The default the value for bin_width
is actually 50 (even through the comments say 10). But I changed it to 200 since I agree with you that higher values can be used.
However, I'm not sure I agree with your timings there. When you're using the parallel version (and you're not loading the file in memory) each process will have to read the file independently and this happens every bin_width
frames. However, some info about the file is only read just once (check e.g., line 24 in normcorre_batch
) and then passed to all the processes. Each process will have to open the TIFF and read from it using that info and the majority of the time is actually used for reading the data using Matlab's internal TIFF library.
I did a small example with a 3000x512x512 file turning off parallelization and using bin_width=100
. The read_file
function was called 32 times for a total of 17 seconds and out of these 17s, 15s were spent on reading the actual data (see attached screenshot).
If you think that a lot of time is spent in re-reading IFDs you can run the profiler (with parallelization turned off) and we could possibly get a better understanding.
We'll check the timings and update you.
@ifittakesallnight One thing that might be happening is that your TIFFs are saved as a single page TIF as opposed to multi page. If this is happening, then you cannot index individual frames and every time you have to access a frame you more or less need to read the whole file. That could lead to the long reading times that you reported. ImageJ/FIJI save TIFFs that are larger than 4GB as a single page TIF so if your files are big, you should make sure that this is not happening.
While registering each image to the template, normcorre_batch.m calls read_file.m to read in a number of frames equal to bin_width, which defaults to 10 and which we frequently set to be much larger. Every time read_file.m is called (at least for TIFFs) it must first read all the IFDs, which can take around 1 minute each time for large TIFFs. If we've read the code correctly, this means for a 30,000-frame movie with bin_width set to 300, the IFDs are read 100X... normcorre would spend 100 minutes on unnecessary file I/O. It might be good to revise the related code to prevent these re-reads.
Thanks.