FCP-INDI / C-PAC

Configurable Pipeline for the Analysis of Connectomes
https://fcp-indi.github.io/
GNU Lesser General Public License v3.0
64 stars 41 forks source link

🐛 [User-reported Bug] Exception in some QC functions do not generate crashfiles and cause processes to hang #1453

Open ccraddock opened 3 years ago

ccraddock commented 3 years ago

Describe the bug There are certain exceptions that occur during QC functions that do not result in crashfiles being generated and cause the parent process to hang.

Here is an example of one such exception:

      [Node] Setting-up "cpac_sub-0027231.nii_bold-snr-axial-qc_256" in "/tmp/cpac_sub-0027231/_scan_rest_run-2/nii_bold-snr-axial-qc_256".
      exception calling callback for <Future at 0x7f2ab87ad110 state=finished raised FileNotFoundError>
      concurrent.futures.process._RemoteTraceback: 
      """
      Traceback (most recent call last):
        File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/traits_extension.py", line 129, in validate
          value = Path(value)  # Use pathlib's validation
        File "/usr/local/miniconda/lib/python3.7/pathlib.py", line 1027, in __new__
          self = cls._from_parts(args, init=False)
        File "/usr/local/miniconda/lib/python3.7/pathlib.py", line 674, in _from_parts
          drv, root, parts = self._parse_args(args)
        File "/usr/local/miniconda/lib/python3.7/pathlib.py", line 658, in _parse_args
          a = os.fspath(a)
      TypeError: expected str, bytes or os.PathLike object, not list

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
          result["result"] = node.run(updatehash=updatehash)
        File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 486, in run
          self._get_hashval()
        File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 538, in _get_hashval
          self._get_inputs()
        File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 609, in _get_inputs
          self.set_input(key, deepcopy(output_value))
        File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 302, in set_input
          setattr(self.inputs, parameter, deepcopy(val))
        File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/traits_extension.py", line 330, in validate
          value = super(File, self).validate(objekt, name, value, return_pathlike=True)
        File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/interfaces/base/traits_extension.py", line 131, in validate
          self.error(objekt, name, str(value))
        File "/usr/local/miniconda/lib/python3.7/site-packages/traits/trait_handlers.py", line 172, in error
          value )
      traits.trait_errors.TraitError: The 'in_file' trait of a RenameInputSpec instance must be a pathlike object or string representing an existing file, but a value of "['/tmp/cpac_sub-0027231/qc_snr_251/montage_snr/_scan_rest_run-2/montage_a/mapflow/_montage_a0/snr_a.png']" <class 'str'> was specified.

      Error setting node input:
      Node: nii_bold-snr-axial-qc_256
      input: in_file
      results_file: /tmp/cpac_sub-0027231/qc_snr_251/montage_snr/_scan_rest_run-2/montage_a/result_montage_a.pklz
      value: ['/tmp/cpac_sub-0027231/qc_snr_251/montage_snr/_scan_rest_run-2/montage_a/mapflow/_montage_a0/snr_a.png']

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/usr/local/miniconda/lib/python3.7/concurrent/futures/process.py", line 239, in _process_worker
          r = call_item.fn(*call_item.args, **call_item.kwargs)
        File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/multiproc.py", line 70, in run_node
          result["result"] = node.result
        File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 217, in result
          op.join(self.output_dir(), "result_%s.pklz" % self.name)
        File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/utils.py", line 291, in load_resultfile
          raise FileNotFoundError(results_file)
      FileNotFoundError: /tmp/cpac_sub-0027231/_scan_rest_run-2/nii_bold-snr-axial-qc_256/result_nii_bold-snr-axial-qc_256.pklz
      """

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "/usr/local/miniconda/lib/python3.7/concurrent/futures/_base.py", line 324, in _invoke_callbacks
          callback(self)
        File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/multiproc.py", line 159, in _async_callback
          result = args.result()
        File "/usr/local/miniconda/lib/python3.7/concurrent/futures/_base.py", line 428, in result
          return self.__get_result()
        File "/usr/local/miniconda/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
          raise self._exception
      FileNotFoundError: /tmp/cpac_sub-0027231/_scan_rest_run-2/nii_bold-snr-axial-qc_256/result_nii_bold-snr-axial-qc_256.pklz

These nodes appear to hang in the scheduler and once enough of them accumulate the pipeline reaches a standstill. My pipeline has been stuck with the following message for over an hour:

210227-00:59:59,821 nipype.workflow INFO:
     [MultiProc] Running 8 tasks, and 37 jobs ready. Free memory (GB): 48.00/64.00, Free processors: 0/8.
                     Currently running:
                       * cpac_sub-0027231.nii_bold-snr-axial-qc_256
                       * cpac_sub-0027231.nii_bold-snr-sagittal-qc_257
                       * cpac_sub-0027231.nii_space-T1w_desc-mean_bold-axial-qc_262
                       * cpac_sub-0027231.nii_space-T1w_desc-mean_bold-sagittal-qc_263
                       * cpac_sub-0027231.nii_space-template_desc-brain_T1w-axial-qc_273
                       * cpac_sub-0027231.nii_space-template_desc-brain_T1w-sagittal-qc_274
                       * cpac_sub-0027231.nii_desc-brain_T1w-axial-qc_269
                       * cpac_sub-0027231.nii_desc-brain_T1w-sagittal-qc_270

To Reproduce Steps to reproduce the behavior:

  1. run sub-0027231 from CORR IBA_TRT with the attached pipeline config, using a docker container
  2. wait for it

Expected behavior The exception should result in a crashfile being generated and should be cleared from the scheduler.

Versions

Additional context

config_files.zip

ccraddock commented 3 years ago

not only does this error not generate a crashfile, it doesn't show up in the pypeline.log file. So you wouldn't see it unless you are watching CPAC's output in the terminal.i have time this week to discuss with y'all if that will help.