Open restrepd opened 1 year ago
how big is your movie and how much RAM does the computer have?
The miniscope .avi movie is 9.6 GB, 600x600 pixels x 53610 images.
grep MemTotal /proc/meminfo MemTotal: 263733696 kB
By the way, using htop we found that part of the problem was that after CaImAn processing with demo_pipeline.py there was residual use of memory. We killed this residual memory by sudo pkill -KILL -u {username}. Now processing of 2P data with demo_pipeline.py is working well, and we always check for residual memory usage.
However, we still have the problem processing the miniscope 9.6 GB.avi file using demo_pipeline_cnmfE.py. When we run it we monitor memory usage with htop and the usage keeps increasing until it goes above 264 GB. At that point the program crashes.
If we crop the miniscope .avi file to 400x400x53610 processing works well with demo_pipeline_cnmfE.py.
Incidentally, in this troubleshooting we are also monitoring GPU usage with nvtop. There is no GPU usage. I wonder if I have a problem with tensorflow.
Best regards,
Diego
Diego Restrepo, PhD Professor of Cell and Developmental Biology http://www.restrepolab.org/ University of Colorado Anschutz Medical Campus Department of Cell and Developmental Biology MS 8108 Bldg RC1 South, Room L18-11119 12801 E 17th Ave Aurora, CO 80045
Tel: 303-724-3405 Fax:303-724-3420
From: Kushal Kolar @.> Date: Monday, October 31, 2022 at 9:30 PM To: flatironinstitute/CaImAn @.> Cc: Restrepo, Diego @.>, Author @.> Subject: Re: [flatironinstitute/CaImAn] Out of memory with demo_pipeline_cnmfE.py (Issue #1016) [External Email - Use Caution]
how big is your movie and how much RAM does the computer have?
— Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fflatironinstitute%2FCaImAn%2Fissues%2F1016%23issuecomment-1297977435&data=05%7C01%7Cdiego.restrepo%40cuanschutz.edu%7Cc873a8ee3d9e42bb45b708dabbb9762a%7C563337caa517421aaae01aa5b414fd7f%7C0%7C0%7C638028702504983097%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aiYsAPxFVvnnyHESd%2BnEM8RtmaFhk%2BDtO09gB%2Bxwyco%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FA2FV7SFMWEVXY36UAKMJ3KDWGCFGJANCNFSM6AAAAAARTGHT34&data=05%7C01%7Cdiego.restrepo%40cuanschutz.edu%7Cc873a8ee3d9e42bb45b708dabbb9762a%7C563337caa517421aaae01aa5b414fd7f%7C0%7C0%7C638028702504983097%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9NptFdM%2FKRXbwiA515LOLHSn80JfW7j0ygJ3qPBvB3E%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>
Where in the pipeline does this happen? In particular, I wonder if it happens before the conversion to memmap files or not.
It has happened before and after in different instances.
D
Get Outlook for iOShttps://aka.ms/o0ukef
From: Pat Gunn @.> Sent: Tuesday, November 1, 2022 9:29:54 AM To: flatironinstitute/CaImAn @.> Cc: Restrepo, Diego @.>; Author @.> Subject: Re: [flatironinstitute/CaImAn] Out of memory with demo_pipeline_cnmfE.py (Issue #1016)
[External Email - Use Caution]
Where in the pipeline does this happen? In particular, I wonder if it happens before the conversion to memmap files or not.
— Reply to this email directly, view it on GitHubhttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fflatironinstitute%2FCaImAn%2Fissues%2F1016%23issuecomment-1298707620&data=05%7C01%7Cdiego.restrepo%40cuanschutz.edu%7C205f19787081478dead708dabc1deda7%7C563337caa517421aaae01aa5b414fd7f%7C0%7C0%7C638029133973605115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bLg%2F4kuntWK6g9UhVvtdvoRFtlusrEr2lvZUugPHOVs%3D&reserved=0, or unsubscribehttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FA2FV7SBKQRYLE2ZRELN4GWDWGEZPFANCNFSM6AAAAAARTGHT34&data=05%7C01%7Cdiego.restrepo%40cuanschutz.edu%7C205f19787081478dead708dabc1deda7%7C563337caa517421aaae01aa5b414fd7f%7C0%7C0%7C638029133973605115%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=73j6YWKn8lx4njROMCsLwRmnG4CE4vK2u%2FbRUE5dn7o%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>
We ended up getting this to work on another workstation although this may help some people have a rough idea to avoid breaking up files:
Tell us a bit about your setup: Operating system (Linux/macOS/Windows): Ubuntu 22.04.1 LTS
Python version (3.x): Python 3.10.6
Working environment (Python IDE/Jupyter Notebook/other): python IDE (same problem with Jupyter notebook)
Which of the demo scripts you're using for your analysis (if applicable): demo_pipeline_cnmfE.py
CaImAn version*: caiman 1.9.11
CaImAn installation process (pip install ./pip install -e ./conda): /pip install -e .`
Tensorflow recognized our GPU (A6000 48GB) outside of CaImAn, but inside CaImAn there was no recognition of the GPU. Running the slightly modified demo_pipeline_cnmfE.py was done successfully through making a second (first was small 2GB swap partition on the OS drive) swap partition in Ubuntu with an NVME on a completely separate OS drive. The purpose was to see how much swap was needed for this avi file.
Once RAM usage was at 98%, the swap partition kickd in steadily and got to 335GB! The file took roughly an hour or so to finish. We did not monitor the code to see what part of the pipeline started the memory use although we will attempt to document this.
so a 10GB avi video file would probably be represented much larger as a binary array in RAM, anyways if the RAM usage is still high after memmap creation it would be faster to reduce the number of threads you're using for CNMF. Swapping, even on very fast PCIe 4.0 NVME SSDs in RAID0, is much slower than RAM and it would probably be better to just use fewer threads.
I'm having the exact same issue. My setup is:
n_processes
values, ranging from 23 to 1. 1 was taking way too long so i landed on 5 instead. I can post the entire config sets if requested, as well.My particular silent oom crash occurs during cnmf.fit()
-> compute_W()
:
https://github.com/flatironinstitute/CaImAn/blob/9b0b79ca61f20ce93259b9833e1fe18e26d4e086/caiman/source_extraction/cnmf/initialization.py#L2021
Here, the script just dies in complete silence. I just kept coming back to the whole process vanished. I performed a line-by-line PDB debug with htop
open to see my memory getting overflown and the process subsequently getting killed and the memory purged.
Also, there seems to be an if-else block right above that attempts to do something about memory management:
https://github.com/flatironinstitute/CaImAn/blob/9b0b79ca61f20ce93259b9833e1fe18e26d4e086/caiman/source_extraction/cnmf/initialization.py#L1984
but the data_fits_in_memory
variable never gets manipulated anywhere else, and it defaults to true in parameter declaration so the data_fits_in_memory==False
-block never gets reached. I'm not entirely sure what's the history behind here, but when I forced the parameter value to false to force-run the block, the same silent OOM crash occurred on line 2021 again.
The main issue for me is that it raises absolutely no alarms in the process level unless you check dmesg. Maybe there could be some kind of pre-check condition for whether the system has sufficient memory for the upcoming process?
I'm happy to draft a PR if someone could point me to a relevant part of the code.
This is a really good idea, to figure out something more principled than just "give it a shot and run the online algorithm when it fails". That is, we should have a more principled way to actually calculate RAM needs for CNMF and CNMFE based on your file size and parameter setting (patches, subsampling, etc), and spit out if you will need to run the online algorithm based on this. My guess is this isn't an entirely trivial calculation otherwise we'd have already done it.
I'm really busy getting ready for the workshop at SFN this week, but it is something I'm interested in pursuing (or at least having a good answer why this is too hard to provide across platforms and taking into account dependency on number of CPU cores or something). I'm not sure if you have ideas about this @pgunn
I believe data_fits_in_memory was meant to be a way for the user to signal, as an initialisation option to CNMF, that they know it won't fit. It'd be interesting (but difficult) to make it automatic. It's generally challenging for software to notice beforehand if it's going to run out of RAM - very little software out there does so.
For better support, please use the template below to submit your issue. When your issue gets resolved please remember to close it.
Sometimes errors while running CNMF occur during parallel processing which prevents the log to provide a meaningful error message. Please reproduce your error with setting
dview=None
.If you need to upgrade CaImAn follow the instructions given in the documentation.
Tell us a bit about your setup:
Operating system (Linux/macOS/Windows): Ubuntu 20.04.5 LTS
Python version (3.x): Python 3.10.6
Working environment (Python IDE/Jupyter Notebook/other): python IDE (same problem with Jupyter notebook)
Which of the demo scripts you're using for your analysis (if applicable): demo_pipeline_cnmfE.py
CaImAn version*: caiman 1.9.11
CaImAn installation process (
pip install .
/pip install -e .
/conda):/
pip install -e .`*You can get the CaImAn version by creating a
params
object and then typingparams.data['caiman_version']
. If the field doesn't exist, type N/A and consider upgrading).We have no problem running the demo data ("data_endoscope.tif")
We also have the same problem every once in a while running demo_pipeline.py
Thanks!
TF_ENABLE_ONEDNN_OPTS=0
. WARNING:root:Movie average is negative. Removing 1st percentile. WARNING:root:Movie average is negative. Removing 1st percentile. WARNING:root:Movie average is negative. Removing 1st percentile. /home/restrepd/caiman_data/demos/general/drg_pipeline_cnmfE.py:118: DeprecationWarning:np.int
is a deprecated alias for the builtinint
. To silence this warning, useint
by itself. Doing this will not modify any behavior and is safe. When replacingnp.int
, you may wish to use e.g.np.int64
ornp.int32
to specify the precision. If you wish to review your current use, check the release note link for additional information. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations bord_px = np.ceil(np.max(np.abs(mc.shifts_rig))).astype(np.int) Killeddmesg output:
[3286529.735885] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/user.slice/user-1000.slice/session-c3.scope,task=python,pid=275594,uid=1000 [3286529.735937] Out of memory: Killed process 275594 (python) total-vm:154079620kB, anon-rss:57028920kB, file-rss:3092kB, shmem-rss:0kB, UID:1000 pgtables:222160kB oom_score_adj:0 [3286532.242727] oom_reaper: reaped process 275594 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:8kB