PennLINC / qsiprep

Preprocessing and reconstruction of diffusion MRI
http://qsiprep.readthedocs.io
BSD 3-Clause "New" or "Revised" License
136 stars 54 forks source link

Exception raised while executing Node run_afq #700

Open Magic-Ludo opened 4 months ago

Magic-Ludo commented 4 months ago

Summary

I ran a reconstruction using the mrtrix_multishell_msmt_pyafq_tractometry pipeline. Everything went well for most of the subjects, but I don't know why, for some subjects, I get an error at the very end of the reconstruction:

-- NO ERROR BEFORE --
INFO:AFQ:Generating colorful lines from tractography...
INFO:AFQ:Preparing ROI...
INFO:AFQ:Preparing ROI...
INFO:AFQ:Preparing ROI...
INFO:AFQ:Preparing ROI...
INFO:nipype.workflow:[Node] Finished "run_afq", elapsed time 52692.52664s.
WARNING:nipype.workflow:Storing result file without outputs
WARNING:nipype.workflow:[Node] Error on "qsirecon_wf.sub-CTR16_mrtrix_multishell_msmt_pyafq_tractometry.sub_CTR16_ses_01_space_T1w_desc_preproc_recon_wf.pyafq_tractometry.run_afq" (/scratch/lcorcos/Temp_QSIPREP_TRACTO_V4/qsirecon_wf/sub-CTR16_mrtrix_multishell_msmt_pyafq_tractometry/sub_CTR16_ses_01_space_T1w_desc_preproc_recon_wf/pyafq_tractometry/run_afq)
ERROR:nipype.workflow:Node run_afq failed to run on host gpu011.cluster.
ERROR:nipype.workflow:Saving crash info to /scratch/lcorcos/EcriPark_QS_Tracto/qsirecon/sub-CTR16/log/20240221-165211_c6845cae-0072-45c0-a431-ae53d3260c4f/crash-20240222-103208-lcorcos-run_afq-6374e835-3483-4bd8-a408-f78cdd1b2ee7.txt
Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node run_afq.

Traceback:
    Traceback (most recent call last):
      File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 397, in run
        runtime = self._run_interface(runtime)
      File "/usr/local/miniconda/lib/python3.8/site-packages/qsiprep/interfaces/pyafq.py", line 106, in _run_interface
        myafq.export_all()
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/api/participant.py", line 182, in export_all
        export_all_helper(self, seg_algo, xforms, indiv, viz)
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/api/utils.py", line 142, in export_all_helper
        api_afq_object.export("indiv_bundles_figures")
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/api/participant.py", line 153, in export
        return self.wf_dict[attr_name]
      File "/usr/local/miniconda/lib/python3.8/site-packages/pimms/calculation.py", line 470, in __getitem__
        self._run_node(self.plan.efferents[k])
      File "/usr/local/miniconda/lib/python3.8/site-packages/pimms/calculation.py", line 534, in _run_node
        if not found: res = node(self)
      File "/usr/local/miniconda/lib/python3.8/site-packages/pimms/calculation.py", line 91, in __call__
        result = self.function(*args)
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/tasks/viz.py", line 276, in viz_indivBundle
        viz_backend.create_gif(figure, fname)
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/viz/plotly_backend.py", line 410, in create_gif
        figure.write_image(tdir + f"/tgif{i}.png")
      File "/usr/local/miniconda/lib/python3.8/site-packages/plotly/basedatatypes.py", line 3821, in write_image
        return pio.write_image(self, *args, **kwargs)
      File "/usr/local/miniconda/lib/python3.8/site-packages/plotly/io/_kaleido.py", line 268, in write_image
        img_data = to_image(
      File "/usr/local/miniconda/lib/python3.8/site-packages/plotly/io/_kaleido.py", line 145, in to_image
        img_bytes = scope.transform(
      File "/usr/local/miniconda/lib/python3.8/site-packages/kaleido/scopes/plotly.py", line 161, in transform
        raise ValueError(
    ValueError: Transform failed with error code 525: Array buffer allocation failed

INFO:nipype.workflow:[MultiProc] Running 0 tasks, and 0 jobs ready. Free memory (GB): 283.34/283.34, Free processors: 28/28.
INFO:nipype.workflow:***********************************
ERROR:nipype.workflow:could not run node: qsirecon_wf.sub-CTR16_mrtrix_multishell_msmt_pyafq_tractometry.sub_CTR16_ses_01_space_T1w_desc_preproc_recon_wf.pyafq_tractometry.run_afq
INFO:nipype.workflow:crashfile: /scratch/lcorcos/EcriPark_QS_Tracto/qsirecon/sub-CTR16/log/20240221-165211_c6845cae-0072-45c0-a431-ae53d3260c4f/crash-20240222-103208-lcorcos-run_afq-6374e835-3483-4bd8-a408-f78cdd1b2ee7.txt
INFO:nipype.workflow:***********************************
/usr/local/miniconda/lib/python3.8/site-packages/joblib/externals/loky/backend/resource_tracker.py:310: UserWarning: resource_tracker: There appear to be 22 leaked folder objects to clean up at shutdown
  warnings.warn(
CRITICAL:cli:QSIPrep failed: Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node run_afq.

Traceback:
    Traceback (most recent call last):
      File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 397, in run
        runtime = self._run_interface(runtime)
      File "/usr/local/miniconda/lib/python3.8/site-packages/qsiprep/interfaces/pyafq.py", line 106, in _run_interface
        myafq.export_all()
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/api/participant.py", line 182, in export_all
        export_all_helper(self, seg_algo, xforms, indiv, viz)
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/api/utils.py", line 142, in export_all_helper
        api_afq_object.export("indiv_bundles_figures")
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/api/participant.py", line 153, in export
        return self.wf_dict[attr_name]
      File "/usr/local/miniconda/lib/python3.8/site-packages/pimms/calculation.py", line 470, in __getitem__
        self._run_node(self.plan.efferents[k])
      File "/usr/local/miniconda/lib/python3.8/site-packages/pimms/calculation.py", line 534, in _run_node
        if not found: res = node(self)
      File "/usr/local/miniconda/lib/python3.8/site-packages/pimms/calculation.py", line 91, in __call__
        result = self.function(*args)
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/tasks/viz.py", line 276, in viz_indivBundle
        viz_backend.create_gif(figure, fname)
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/viz/plotly_backend.py", line 410, in create_gif
        figure.write_image(tdir + f"/tgif{i}.png")
      File "/usr/local/miniconda/lib/python3.8/site-packages/plotly/basedatatypes.py", line 3821, in write_image
        return pio.write_image(self, *args, **kwargs)
      File "/usr/local/miniconda/lib/python3.8/site-packages/plotly/io/_kaleido.py", line 268, in write_image
        img_data = to_image(
      File "/usr/local/miniconda/lib/python3.8/site-packages/plotly/io/_kaleido.py", line 145, in to_image
        img_bytes = scope.transform(
      File "/usr/local/miniconda/lib/python3.8/site-packages/kaleido/scopes/plotly.py", line 161, in transform
        raise ValueError(
    ValueError: Transform failed with error code 525: Array buffer allocation failed

Traceback (most recent call last):
  File "/usr/local/miniconda/bin/qsiprep", line 8, in <module>
    sys.exit(main())
  File "/usr/local/miniconda/lib/python3.8/site-packages/qsiprep/cli/run.py", line 677, in main
    qsiprep_wf.run(**plugin_settings)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/workflows.py", line 638, in run
    runner.run(execgraph, updatehash=updatehash, config=self.config)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/plugins/base.py", line 224, in run
    raise error from cause
RuntimeError: Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
  File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node run_afq.

Traceback:
    Traceback (most recent call last):
      File "/usr/local/miniconda/lib/python3.8/site-packages/nipype/interfaces/base/core.py", line 397, in run
        runtime = self._run_interface(runtime)
      File "/usr/local/miniconda/lib/python3.8/site-packages/qsiprep/interfaces/pyafq.py", line 106, in _run_interface
        myafq.export_all()
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/api/participant.py", line 182, in export_all
        export_all_helper(self, seg_algo, xforms, indiv, viz)
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/api/utils.py", line 142, in export_all_helper
        api_afq_object.export("indiv_bundles_figures")
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/api/participant.py", line 153, in export
        return self.wf_dict[attr_name]
      File "/usr/local/miniconda/lib/python3.8/site-packages/pimms/calculation.py", line 470, in __getitem__
        self._run_node(self.plan.efferents[k])
      File "/usr/local/miniconda/lib/python3.8/site-packages/pimms/calculation.py", line 534, in _run_node
        if not found: res = node(self)
      File "/usr/local/miniconda/lib/python3.8/site-packages/pimms/calculation.py", line 91, in __call__
        result = self.function(*args)
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/tasks/viz.py", line 276, in viz_indivBundle
        viz_backend.create_gif(figure, fname)
      File "/usr/local/miniconda/lib/python3.8/site-packages/AFQ/viz/plotly_backend.py", line 410, in create_gif
        figure.write_image(tdir + f"/tgif{i}.png")
      File "/usr/local/miniconda/lib/python3.8/site-packages/plotly/basedatatypes.py", line 3821, in write_image
        return pio.write_image(self, *args, **kwargs)
      File "/usr/local/miniconda/lib/python3.8/site-packages/plotly/io/_kaleido.py", line 268, in write_image
        img_data = to_image(
      File "/usr/local/miniconda/lib/python3.8/site-packages/plotly/io/_kaleido.py", line 145, in to_image
        img_bytes = scope.transform(
      File "/usr/local/miniconda/lib/python3.8/site-packages/kaleido/scopes/plotly.py", line 161, in transform
        raise ValueError(
    ValueError: Transform failed with error code 525: Array buffer allocation failed

Additional details

Using a compute node with this configuration: Dell PowerEdge C4130 (28 cores) Intel Xeon CPU E5-2680 v4, 320 GB RAM

Reproducing the bug

I tried to run the following script:

#!/bin/bash

#SBATCH -J QS_Tracto_Mis
#SBATCH -p pascal
#SBATCH -A b356
#SBATCH -N 1
#SBATCH -t 70:00:00
#SBATCH --cpus-per-task=28
#SBATCH --mem=300G
#SBATCH --array=1-5%2
#SBATCH --output=/home/lcorcos/logs/QSIPREP_tracto/%j-stdout.txt
#SBATCH --error=/home/lcorcos/logs/QSIPREP_tracto/%j-stderr.txt
#SBATCH --mail-type=BEGIN,END,FAIL,TIME_LIMIT
#SBATCH --mail-user=ludovic.corcos@gmail.com

set -e

date

EcriPark="/scratch/lcorcos/EcriPark_QSIPREP/qsiprep/"
cd /home/lcorcos
source .bashrc

SUB=$(sed -n "${SLURM_ARRAY_TASK_ID}p" /home/lcorcos/EcriPark_Code/sub_missing.txt)

singularity run --cleanenv \
    -B ${HOME}/EcriPark_Code:/code,/scratch/lcorcos/EcriPark/,/scratch/lcorcos/EcriPark_FreeSurfer/,/scratch/lcorcos/EcriPark_QSIPREP/,/home/lcorcos/freesurfer/license.txt,/scratch/lcorcos/EcriPark_QS_Tracto/,/scratch/lcorcos/Temp_QSIPREP_TRACTO_V4/ \
    ${HOME}/qsiprep-0.18.1.sif \
    /scratch/lcorcos/EcriPark/ \
    /scratch/lcorcos/EcriPark_QS_Tracto/ participant \
    --participant_label ${SUB} \
    --recon_input /scratch/lcorcos/EcriPark_QSIPREP/qsiprep/ \
    --skip_bids_validation \
    --nthreads 28 \
    --omp-nthreads 28 \
    --work_dir /scratch/lcorcos/Temp_QSIPREP_TRACTO_V4/ \
    --recon_spec /code/mrtrix_multishell_msmt_pyafq_tractometryV2.json \
    --freesurfer-input /scratch/lcorcos/EcriPark_FreeSurfer/ \
    --recon-only \
    --skip_odf_reports \
    --fs_license_file /home/lcorcos/freesurfer/license.txt \
    --verbose

date

The "Array buffer allocation failed" problem reminds me of a memory shortage problem. I had initially tried with 128 GB RAM, currently I'm at 300 GB and I can't increase more and the problem is still there. For the other subjects, it worked fine with 128 GB RAM.

arokem commented 4 months ago

Might be related to XVFB configuration, see this comment: https://github.com/plotly/orca/issues/223#issuecomment-520846414