Deep-MI / FastSurfer

PyTorch implementation of FastSurferCNN
Apache License 2.0
459 stars 119 forks source link

mris_sample_parc crash with out of memory (OOM) message #444

Closed karllandheer closed 8 months ago

karllandheer commented 8 months ago

Hello, I am running FastSurfer on an Ubuntu 20.02 instance with 2 cores, 16 GB RAM and 75 GB disc space.

The command I am using is:

docker run -v ${SETUP_DIR}:/data deepmi/fastsurfer --t1 /data/T1_unbiased_brain.nii.gz --sd /data/ --sid Tutorial --py python3 --allow_root --3T --fs_license /data/FreeSurferLicense.txt --batch 1 #removed 4 threads, and parallelization

And the error message is:

=========== Creating surfaces lh - map input asegdkt_segfile to surf =============== mris_sample_parc -ct /opt/freesurfer/average/colortable_desikan_killiany.txt -file /fastsurfer/recon_surf/lh.DKTatlaslookup.txt -projmm 0.6 -f 5 -surf white.preaparc Tutorial lh aparc.DKTatlas+aseg.orig.mgz aparc.DKTatlas.mapped.prefix.annot mris_sample_par invoked oom-killer: gfp_mask=0x1100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0 Out of memory: Killed process 26539 (mris_sample_par) total-vm:14980620kB, anon-rss:14963040kB, file-rss:4kB, shmem-rss:0kB, UID:100000 pgtables:29344kB oom_score_adj:0

Obviously it seems like it's requesting more RAM than is available, however the support says that 8 GB should be enough, while my instance has 16 GB. The same command runs without an issue with 32 GB instance, however I do not believe I should have to go to that instance (which costs over double per hour). Is there anything I'm doing incorrectly? Any help would be greatly appreciated.

m-reuter commented 8 months ago

Hi, yes it is a known issue (e.g. see #397) that Freesurfer's mris_sample_parc sometimes uses a lot of memory (and we don't know why). That's why I implemented our own routine for it in dev (see PR #430) which will be part of our next release. You can either try out the dev version or use more RAM for the cases that crash.

karllandheer commented 8 months ago

Ah ok great. Is there a docker with the dev version? If so could you direct me how to pull it? Or do I have to make the docker image?

m-reuter commented 8 months ago

You need to build it. There is a readme and a build script in the docker directory.

dkuegler commented 8 months ago

@karllandheer For you as a note, this memory issue does not happen with all cases, so it is on a case-by-case basis.

karllandheer commented 8 months ago

Hello, 2/2 subjects crashed with my 16 GB instance. I believe you, but it seems to be quite common. I am working on building the docker image with the dev version now. I will report back whether this solves the issue.

dkuegler commented 8 months ago

Hello, 2/2 subjects crashed with my 16 GB instance. I believe you, but it seems to be quite common. I am working on building the docker image with the dev version now. I will report back whether this solves the issue.

Well, it was much more limited in my tests (2 cases in all of OASIS1 I believe)... But we did have to lift the memory requirements in the slurm script as that was the first time this really was an issue.

karllandheer commented 8 months ago

Hello, the dev branch worked. There was a small "bug" where I got an error message that numpy arrays have no method .sqrt() in conform.py, which I just changed to np.sqrt(). Probably some kind of minor version thing. Just FYI incase other people come across this issue. Thank you for your help.

dkuegler commented 8 months ago

Thanks for reporting this issue. I'll take a look.