Docker - "Performing inference on FOLD: 1 /extra/pipeline.sh: line 38: 1206 Killed"

MASILab / Synb0-DISCO

Distortion correction of diffusion weighted MRI without reverse phase-encoding scans or field-maps

https://my.vanderbilt.edu/masi

58 stars 29 forks source link

Docker - "Performing inference on FOLD: 1 /extra/pipeline.sh: line 38: 1206 Killed" #29

Closed mojomattv closed 1 year ago

mojomattv commented 2 years ago

Hi and thank you very much for sharing this program. I was hoping for some advice on an error that keeps being thrown when running the following command:

sudo docker run --rm -v $(pwd)/INPUTS/:/INPUTS/ -v $(pwd)/OUTPUTS:/OUTPUTS/ -v $(pwd)/INPUTS/license.txt:/extra/freesurfer/license.txt --user $(id -u):$(id -g) hansencb/synb0 --notopup

Everything seems to run smoothly until the stage of "Performing inference on FOLD: *" stage, when there is apparently an issue with line 38 of pipeline.sh . Any clarification or possible solution would be greatly appreciate (see attached for output of the above command).

synb0_output.txt

Kikubernetes commented 2 years ago

Hi mojomattv! I got the same error as you. My environment is as follows: Mac OS Monterey, Docker version 4.8.2, Synb0-DISCO v2.0. I increased the resource allocation for docker (from Docker Desktop>settings>resources), and It worked perfectly! If you use Mac, you can try this.

anshuhim20 commented 2 years ago

Dear List, I am performing the same without docker and getting stuck at exact step where @mojomattv was struggling. I am performing this in Red-hat 7 by modifying my pipeline.sh file. The error is RuntimeError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 0; 1.94 GiB total capacity; 908.41 MiB already allocated; 17.38 MiB free; 916.00 MiB reserved in total by PyTorch)

Please find attached my pipeline and the output file .

Kindly suggest the needful Reduction of batch size is a suggested but how to do that? or Is there a way by which we can increase the resource allocation for this program

Thanks and regards Himanshu Joshi

Synb0_output.txt pipeline.txt i

Diffusion-MRI commented 2 years ago

HI all,

Kikubernetes is correct, this is failing at the inference stage and is only due to memory. We suggest allocating 16Gb of RAM (or more) when running either the Docker or Singularity image.

Thank you, Kurt

fionaEyoung commented 1 year ago

Might it be worth updating the README to reflect this? Currently it states "we suggest giving Docker access to >8Gb of RAM". I also had this issue on my desktop machine (I allocated 15G out of 16 available) so ran it in singularity on an HCP cluster instead. The job only reported a maxvmem of 13.110GB though.

schillkg commented 1 year ago

Hi Fiona and all - great suggestion. We have now reflected this in the README. Also - I apologize for the delayed response!