flatironinstitute / CaImAn-MATLAB

Complete Matlab pipeline for large scale calcium imaging data analysis
GNU General Public License v2.0
252 stars 147 forks source link

run_CNMF_patches, out of memory errors #50

Closed marius10p closed 7 years ago

marius10p commented 8 years ago

I have been trying to run Matlab CNMF on a dataset consisting of 512x512x100,000 frames (~60GB, 1hour recording). With or without parfor, run_CNMF_patches always runs out of memory for me. I have tried it on machines up to 256GB RAM, Windows or Linux. Is there a known solution to this, or a different way to run on this type of data?

It continues running for hours before it runs out of memory, even though every patch is processed independently. Could there be a very slow memory leak somewhere?

The error occurs in cvx, but the memory buildup is gradual.

Error using linsysolve (line 205) Out of memory. Type HELP MEMORY for your options.

ucsfidl commented 7 years ago

Hello! Did you manage to fix this issue? We have the same problem. Any comments are much appreciated. Thanks, Anna

marius10p commented 7 years ago

I did not manage to fix the issue. You could downsample temporally to find the ROIs, but then you don't get the demixed timecourses.

epnev commented 7 years ago

@marius10p @ucsfidl

Thanks for pointing me to this, and sorry for not responding earlier. I'm working on this now, trying to avoid using cvx altogether, which can be experimental when ran in parallel mode. This parallel mode refers to the update_temporal_components command that is ran at the end of run_CNMF_patches and can be turned off by setting options.temporal_parallel = false. So using this option before calling run_CNMF_patches might provide a fix.

Another possible quick fix would be to run the entire algorithm with the autoregressive order set to p = 0, and then at the end, run deconvolution (e.g., constrained foopsi) on each row of C+YrA. I can explain in more detail how to do this, until I come up with a reliable solution.

ucsfidl commented 7 years ago

@epnev Hi thanks for the reply. So we already set p=0 before when we run into the problem. Now we tried turn off the parallel but still it got the same error. We downsampled temporal info to 16 and we can try larger.

So another way I am thinking is to find the cells (spatial components) from the first file, and then extract the same temporal components from second file. That is to say, we can apply stored cell information and apply it to another recording file to extract signals from the same bunch of neurons. So we could break the files into smaller ones, and also sometimes we want to come back to the same regions after a while.

Thanks, and look forward to your reply. JSUN Stryker lab

epnev commented 7 years ago

@ucsfidl @marius10p

I made some modifications on the code and now things seem to scale much better. I haven't tried a 100k dataset but after memory mapping I was able to able to run run_CNMF_patches for a 512 x 512, 50k frame dataset in about 1 hour on a 10 core linux box with 128GB of RAM. The RAM usage did not seem to exceed ~50-55GB at any point in the simulation so I expect that a 256GB RAM machine should be able to process a 100k dataset in a reasonable amount of time.

I'll make more changes in the near future, mostly integrating faster deconvolution methods that bypass CVX so I expect a significant improvement from that. Let me know if you have any issues with it.

epnev commented 7 years ago

These issues have now been dealt with. Take a look at the run_pipeline.m file and it's explanation at the wiki for more details.