FCP-INDI / C-PAC

Configurable Pipeline for the Analysis of Connectomes
https://fcp-indi.github.io/
GNU Lesser General Public License v3.0
64 stars 41 forks source link

Pipelines hanging on DVARS calculation #1301

Open sgiavasis opened 4 years ago

sgiavasis commented 4 years ago

Not sure if anyone else has seen this or can reproduce this- in some of my pipeline runs, specifically on AWS via the Docker container (both latest and nightly), the runs will hang indefinitely at DVARS:

200611-20:25:53,3 nipype.workflow INFO:
     [MultiProc] Running 2 tasks, and 130 jobs ready. Free memory (GB): 11.60/12.00, Free processors: 1/3.
                     Currently running:
                       * resting_preproc_1019436_1.gen_motion_stats_afni_mean_3dvolreg_0.cal_DVARS
                       * resting_preproc_1019436_1.gen_motion_stats_afni_mean_3dvolreg_1.cal_DVARS

This is reliably happening for me with these two pipelines:

It does not happen in this one:

shnizzedy commented 3 years ago

https://github.com/FCP-INDI/C-PAC/blob/6eadaed317a9b256a82f15bccee57a5e8de7514e/CPAC/resources/configs/pipeline_config_regtest-1.yml#L302

https://github.com/FCP-INDI/C-PAC/blob/6eadaed317a9b256a82f15bccee57a5e8de7514e/CPAC/resources/configs/pipeline_config_regtest-3.yml#L302

https://github.com/FCP-INDI/C-PAC/blob/6eadaed317a9b256a82f15bccee57a5e8de7514e/CPAC/resources/configs/pipeline_config_regtest-2.yml#L302

anibalsolon commented 3 years ago

hey @sgiavasis and @shnizzedy, I've been experiencing this-

         [MultiProc] Running 3 tasks, and 4 jobs ready. Free memory (GB): 3.00/16.00, Free processors: 5/8.
                     Currently running:
                       * cpac_sub-2842950_ses-1.gen_motion_stats_106.cal_DVARS
                       * cpac_sub-2842950_ses-1.ANTS_T1_to_template_symmetric_64.anat_mni_ants_register_symmetric.calc_ants_warp
                       * cpac_sub-2842950_ses-1.gen_motion_stats_106.cal_DVARS

I believe the function is dying out of memory, and even it is running it twice? Not sure if it is a Nipype misbehavior.

Do you have any clues about it?

this is my command line:

docker run -it -v `pwd`/output:/output fcpindi/c-pac s3://fcp-indi/data/Projects/CORR/RawDataBIDS/NKI_TRT /output participant --participant_label sub-2842950 --save_working_dir /output --n_cpus 8 --mem_gb 16 --pipeline_override 'pipeline_setup:
  output_directory:
    generate_quality_control_images: false'
anibalsolon commented 3 years ago

Skipping bids-validator for S3 datasets...
#### Running C-PAC for sub-2842950
Number of participants to run in parallel: 1
Input directory: s3://fcp-indi/data/Projects/CORR/RawDataBIDS/NKI_TRT
Output directory: /output/output
Working directory: /output
Log directory: /output/log
Remove working directory: False
Available memory: 16.0 (GB)
Available threads: 8
Number of threads for ANTs: 1
Parsing s3://fcp-indi/data/Projects/CORR/RawDataBIDS/NKI_TRT..
Connecting to AWS: fcp-indi anonymously...
gathering files from S3 bucket (s3.Bucket(name='fcp-indi')) for data/Projects/CORR/RawDataBIDS/NKI_TRT
Did not receive any parameters for sub-2842950/ses-1/func/sub-2842950_ses-1_task-breathhold_acq-tr1400ms_run-1_bold.nii.gz, is this a problem?
 sub-2842950 ses-2 is missing an anat
Starting participant level processing
Run called with config file /output/cpac_pipeline_config_2021-03-31T17-49-17Z.yml
210331-17:49:20,23 nipype.workflow INFO:

    C-PAC version: 1.8.0

    Setting maximum number of cores per participant to 8
    Setting number of participants at once to 1
    Setting OMP_NUM_THREADS to 1
    Setting MKL_NUM_THREADS to 1
    Setting ANTS/ITK thread usage to 1
    Maximum potential number of cores that might be used during this run: 8
anibalsolon commented 3 years ago

I wonder if the problem is a numpy compilation problem, or even a bug from their side, to have a segfault.

Fatal Python error: Segmentation fault

Thread 0x00007ff107f8d700 (most recent call first):
  File "/usr/local/miniconda/lib/python3.7/threading.py", line 300 in wait
  File "/usr/local/miniconda/lib/python3.7/threading.py", line 552 in wait
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/utils/profiler.py", line 107 in run
  File "/usr/local/miniconda/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/usr/local/miniconda/lib/python3.7/threading.py", line 890 in _bootstrap

Current thread 0x00007ff1228a3740 (most recent call first):
  File "/usr/local/miniconda/lib/python3.7/site-packages/numpy/lib/function_base.py", line 1273 in diff
  File "/code/CPAC/generate_motion_statistics/generate_motion_statistics.py", line 562 in calculate_DVARS
  File "/code/CPAC/utils/interfaces/function.py", line 152 in _run_interface
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/core.py", line 419 in run
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 741 in _run_command
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 635 in _run_interface
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 516 in run
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/multiproc.py", line 67 in run_node
  File "/usr/local/miniconda/lib/python3.7/concurrent/futures/process.py", line 239 in _process_worker
  File "/usr/local/miniconda/lib/python3.7/multiprocessing/process.py", line 99 in run
shnizzedy commented 2 years ago

@sgiavasis have you seen this lately?