denisecailab / minian

miniscope analysis pipeline with interactive visualizations
GNU General Public License v3.0
92 stars 37 forks source link

computation is almost frozen in "Save motion" section #202

Open zz-rezaei opened 2 years ago

zz-rezaei commented 2 years ago

Hello all,

I am processing a very large data set (320 GB) using 4 workers with memory limit 60 GB for each of them (It is the best combination I could find after several trial and errors). Every step before "Save Motion" section runs in a reasonable time. But in Save Motion section, after couple of hours the computation proceeds in a very slow speed and after a while it seems like there is no progress, however the notebook is still Running. When I check the Status in localhost:8787, there are seven bars that show the progress. Five of those ('blocks', 'finalize', 'est_motion_c...', 'from_value', and 'est_motion_c...') are totally done but the two other bars ('store' and 'rechunck_store') have had some progress but not any more. CPU and memory in use are around 0% and the speed of read and write are around 1 KB. So, it is like there is no computation or storing in progress.

My question is that why in this step of the pipeline, the workers do not use the 60 GB memory that I have devoted to each of them? What is making this step so slow and almost impossible to be finished (I have run the pipeline several times and each time I have waited for more than 24 hours for this step)?

The other point that I need to mention is that, the result is being written on a network attached storage (NAS) not the local drive of the computer. But this shouldn't be a problem because the videos were also read from NAS without any issue.

I would appreciate any help or suggestion as this is becoming so frustrating and time consuming.

Thank you. Zahra

phildong commented 2 years ago

Hey so it's really inefficient to write to a NAS since minian writes lots of tiny chunked data and each write induce some overhead. My guess is somewhere during these write requests some network issue happened and some writing requests were lost and never finished. I'd recommend pointing minian_intermediate to a local SSD. In addition if you find mc unnecessarily slow you can probably decrease upsample to speed it up.