PennLINC / qsiprep

Preprocessing and reconstruction of diffusion MRI
http://qsiprep.readthedocs.io
BSD 3-Clause "New" or "Revised" License
137 stars 54 forks source link

Eddy error when using GPU #747

Open araikes opened 1 month ago

araikes commented 1 month ago

Summary

Eddy crashes in v 0.21.4 if using CUDA.

Additional details

What were you trying to do?

Preprocess data with current version of QSIPrep. Eddy crashed.

Reproducing the bug

Crash log:

Node: qsiprep_wf.single_subject_01_wf.dwi_preproc_dir_AP_run_001_wf.hmc_sdc_wf.eddy
Working directory: /tmp/qsiprep_wf/single_subject_01_wf/dwi_preproc_dir_AP_run_001_wf/hmc_sdc_wf/eddy

Node inputs:

args = 
cnr_maps = True
dont_peas = False
dont_sep_offs_move = False
environ = {'FSLOUTPUTTYPE': 'NIFTI_GZ', 'OMP_NUM_THREADS': '8'}
estimate_move_by_susceptibility = True
fep = False
field = <undefined>
field_mat = <undefined>
flm = quadratic
fudge_factor = 10.0
fwhm = <undefined>
in_acqp = <undefined>
in_bval = <undefined>
in_bvec = <undefined>
in_file = <undefined>
in_index = <undefined>
in_mask = <undefined>
in_topup_fieldcoef = <undefined>
in_topup_movpar = <undefined>
initrand = <undefined>
interp = spline
is_shelled = True
json = <undefined>
mbs_ksp = <undefined>
mbs_lambda = <undefined>
mbs_niter = <undefined>
method = jac
mporder = 5
multiband_factor = <undefined>
multiband_offset = <undefined>
niter = 5
num_threads = 8
nvoxhp = 1000
out_base = eddy_corrected
outlier_nstd = <undefined>
outlier_nvox = <undefined>
outlier_pos = <undefined>
outlier_sqr = <undefined>
outlier_type = <undefined>
output_type = NIFTI_GZ
repol = True
residuals = True
session = <undefined>
slice2vol_interp = <undefined>
slice2vol_lambda = <undefined>
slice2vol_niter = <undefined>
slice_order = <undefined>
slm = linear
use_cuda = True

Traceback (most recent call last):
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node eddy.

Cmdline:
    eddy_cuda10.2  --cnr_maps --estimate_move_by_susceptibility --field=/tmp/qsiprep_wf/single_subject_01_wf/dwi_preproc_dir_AP_run_001_wf/hmc_sdc_wf/topup/fieldmap_HZ --field_mat=/tmp/qsiprep_wf/single_subject_01_wf/dwi_preproc_dir_AP_run_001_wf/hmc_sdc_wf/topup_to_eddy_reg/topup_reg_image_flirt.mat --flm=quadratic --ff=10.0 --acqp=/tmp/qsiprep_wf/single_subject_01_wf/dwi_preproc_dir_AP_run_001_wf/hmc_sdc_wf/gather_inputs/eddy_acqp.txt --bvals=/nifti/sub-01/dwi/sub-01_dir-AP_run-001_dwi.bval --bvecs=/nifti/sub-01/dwi/sub-01_dir-AP_run-001_dwi.bvec --imain=/tmp/qsiprep_wf/single_subject_01_wf/dwi_preproc_dir_AP_run_001_wf/pre_hmc_wf/merge_and_denoise_wf/dwi_denoise_dir_AP_run_001_dwi_wf/degibbser/sub-01_dir-AP_run-001_dwi_denoised_unrung.nii --index=/tmp/qsiprep_wf/single_subject_01_wf/dwi_preproc_dir_AP_run_001_wf/hmc_sdc_wf/gather_inputs/eddy_index.txt --mask=/tmp/qsiprep_wf/single_subject_01_wf/dwi_preproc_dir_AP_run_001_wf/hmc_sdc_wf/transform_mask_to_eddy/topup_imain_corrected_avg_trans_mask_trans_flirt.nii.gz --interp=spline --data_is_shelled --json=/tmp/qsiprep_wf/single_subject_01_wf/dwi_preproc_dir_AP_run_001_wf/pre_hmc_wf/merge_and_denoise_wf/merge_dwis/merged_metadata.json --resamp=jac --mporder=5 --niter=5 --nthr=8 --nvoxhp=1000 --out=/tmp/qsiprep_wf/single_subject_01_wf/dwi_preproc_dir_AP_run_001_wf/hmc_sdc_wf/eddy/eddy_corrected --repol --residuals --slm=linear
Stdout:

    Warning: In a future release the first argument will have to be "diffusion" when using eddy on diffusion data, i.e.
    eddy diffusion --imain='my_ima' --acqp='my_acqp' ...

    Warning: Writing of individual text files will be discontinued in favour of a single .json file in future versions

    EddyInputError:  The version compiled for GPU can only use 1 CPU thread (i.e. --nthr=1)
    Terminating program
Stderr:

Traceback:
    Traceback (most recent call last):
      File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 453, in aggregate_outputs
        setattr(outputs, key, val)
      File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 330, in validate
        value = super(File, self).validate(objekt, name, value, return_pathlike=True)
      File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 135, in validate
        self.error(objekt, name, str(value))
      File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/traits/base_trait_handler.py", line 74, in error
        raise TraitError(
    traits.trait_errors.TraitError: The 'out_corrected' trait of an ExtendedEddyOutputSpec instance must be a pathlike object or string representing an existing file, but a value of '/tmp/qsiprep_wf/single_subject_01_wf/dwi_preproc_dir_AP_run_001_wf/hmc_sdc_wf/eddy/eddy_corrected.nii.gz' <class 'str'> was specified.

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 400, in run
        outputs = self.aggregate_outputs(runtime)
      File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 460, in aggregate_outputs
        raise FileNotFoundError(msg)
    FileNotFoundError: No such file or directory '/tmp/qsiprep_wf/single_subject_01_wf/dwi_preproc_dir_AP_run_001_wf/hmc_sdc_wf/eddy/eddy_corrected.nii.gz' for output 'out_corrected' of a ExtendedEddy interface

Command:

apptainer run --containall --writable-tmpfs --nv \
-B $PWD/nifti:/nifti:ro \
-B $PWD/derivatives/qsiprep_0.21.4:/output \
-B /groups/adamraikes/license.txt:/license.txt \
-B $PWD/extra:/extra \
-B /groups/adamraikes/templateflow:/opt/templateflow \
-B /tmp:/tmp /groups/adamraikes/singularity_images/qsiprep_0.21.4.sif \
/nifti /output participant --participant-label 01 \
--denoise-method dwidenoise --unringing-method rpg \
--nthreads 24 --omp-nthreads 8 \
--output-resolution 1 --eddy-config /extra/eddy_params_v2.json \
--fs-license-file /license.txt -w /tmp --resource-monitor --skip-bids-validation

Eddy config:

{
  "flm": "quadratic",
  "slm": "linear",
  "fep": false,
  "interp": "spline",
  "nvoxhp": 1000,
  "fudge_factor": 10,
  "dont_sep_offs_move": false,
  "dont_peas": false,
  "niter": 5,
  "method": "jac",
  "repol": true,
  "num_threads": 1,
  "is_shelled": true,
  "use_cuda": true,
  "cnr_maps": true,
  "residuals": true,
  "output_type": "NIFTI_GZ",
  "estimate_move_by_susceptibility": true,
  "mporder": 5,
  "args": ""
}
mattcieslak commented 1 month ago

oh wow, this is new. Should be an easy fix, luckily

mattcieslak commented 1 month ago

in the meantime you can set --omp-nthreads 1