FCP-INDI / C-PAC

Configurable Pipeline for the Analysis of Connectomes
https://fcp-indi.github.io/
GNU Lesser General Public License v3.0
64 stars 41 forks source link

πŸ› [fmriprep-ingress] pipeline failed to start #2144

Open suxpert opened 2 months ago

suxpert commented 2 months ago

Describe the bug

C-PAC seems too complex (for me) to correctly config, and after going through the not-well-written document, I still have no idea how could I run C-PAC on top of derivated data from fmriprep. Since the docker image print almost all its pipeline/config options as optional, I tried calling it as a BIDS-app with only nessesary arguments (see command below). Then C-PAC crashed saying C-PAC failed to start, see log for details.

I've checked for #2085, which show the same error as I met, but from #2085, I still do not know how to makes C-PAC run correctly.

To reproduce

  1. docker pull fcpindi/c-pac
  2. docker pull nipreps/fmriprep
  3. docker run .... nipreps/fmriprep --fs-no-reconall $BIDS $BIDS/derivatives/fprep participant
  4. docker run .... fcpindi/c-pac -preconfig fmriprep-ingress $BIDS/derivatives/fprep $BIDS/derivatives/cpac test_config

Preconfig

Custom pipeline configuration

No response

Run command

docker run -it --rm -u $(id -u) -v $BIDS/derivatives/fmriprep:/fprep:ro -v $BIDS/derivatives/cpac:/out fcpindi/c-pac --preconfig fmriprep-ingress /fprep /out test_config

Expected behavior

This command specified nothing but a pre-configured pipeline fmriprep-ingress, since the output of fmriprep is clear enough for any post-processing tools to understand, I would expect any BIDS-apps that said to be compatible with fmriprep be good to run without any issue.

Acceptance criteria

Screenshots

logs and generated config files:

C-PAC version

v1.8.7

Container platform

Docker

Docker and/or Singularity version(s)

Docker version 26.1.5-ce

Additional context

No response

tamsinrogers commented 2 months ago

Hi @suxpert, thanks for reaching out.

In terms of the command you are running, it will depend on what’s in your $BIDS/derivatives/fmriprep. The issue here is likely that C-PAC is expecting something from fmriprep that isn’t included in your fmriprep outputs.

Specifically, the timeseries for the ingress pipeline is not being pulled in.

Resources:
['pipeline-ingress_desc-confounds_timeseries']

For some first steps, I can direct you to some documentation specific to pulling regressors from the fMRIPrep Output directory. The derivatives C-PAC should run with this config are listed here .

In your case, it looks like the fmriprep outputs either don’t have regressors or the regressors aren’t named how C-PAC expects them to be. Depending on what you have and what you want, you should be able to run with a custom pipeline config like one of these:

(Create regressors instead of ingressing them, don’t write them out):

FROM: fmriprep-ingress
nuisance_corrections:
  2-nuisance_regression:
    ingress_regressors:
      run: Off

(Create regressors instead of ingressing them, do write them out):

FROM: fmriprep-ingress
nuisance_corrections:
  2-nuisance_regression:
    ingress_regressors:
      run: Off
    create_regressors: On

_(Replace global_signal, white_matter with the column names in their fmriprep output *_desc-confounds_timeseries.tsvfile)_:

FROM: fmriprep-ingress
nuisance_corrections:
  2-nuisance_regression:
    run: [Off]

_(Use different regressors from frmiprep than global_signal, white_matter)_:

FROM: fmriprep-ingress
nuisance_corrections:
  2-nuisance_regression:
    ingress_regressors:
      Regressors:
        Columns: [global_signal, white_matter]

Looking at your C-PAC-generated data config, this looks to be causing an issue - you will need a data config (until this in progress feature is completed). Looking at the issue you linked to, here is the documentation on how to format the config with fmriprep ingress. The data config should look similar to this for one subject:

-site: site-1
  subject_id: 01
  unique_id: 02
  derivatives_dir: /bids_dir/derivatives/fmriprep/sub-01/ses-01

I'd also recommend taking a look at a recently developed C-PAC feature that allows for the ability to ingress FMRIPrep output data directly into the C-PAC resource pool. This feature allows users to perform further processing on that data. There is some additional explanation regarding the ingress process and fMRIPrep regressors in this PR, and a similar issue that may be good to have as a reference point was resolved here.

Could you clarify exactly what kind of additional post-processing you are looking for C-PAC to do in this instance? Any additional information you can provide will help us point you in the right direction here as we continue to compile answers to your questions in Acceptance Criteria.

suxpert commented 2 months ago

It is weird because the regressors C-PAC needed DO exists in fmriprep's output:

$ head -1 bids/derivatives/fmriprep/sub-0151/func/sub-0151_task-rest_run-01_desc-confounds_timeseries.tsv | tr '\t' '\n' | grep -n '\<global_signal\>\|\<white_matter\>'
1:global_signal
9:white_matter

Is C-PAC searching for some other files rather than the fmriprep's default *desc-confounds_timeseries.tsv?

suxpert commented 2 months ago

Since the global_signal, white_matter DO exist in fmriprep's output, and since #2085 and your answer pointing that I need a data config file, here is what I tried and what I got:

I tried my best to keep the data_config.yml the same as in document, and wrote this:

- site: site-1
  subject_id: 01
  unique_id: 02
  derivatives_dir: /fprep/sub-0151

and run C-PAC via:

docker run -it --rm -u $(id -u)                     \
    -v $BIDS:/bids:ro                               \
    -v $BIDS/derivatives/fprep-24.0.1:/fprep:ro     \
    -v $PWD/data_config.yml:/data_config.yml:ro     \
    -v $BIDS/derivatives/cpca-1.8.7:/out            \
    fcpindi/c-pac:release-v1.8.7                    \
    --data-config-file /data_config.yml             \
    --preconfig fmriprep-ingress --tracking-opt-out \
    /fprep /out test_config

Then I got the following error:

#### Running C-PAC
Number of participants to run in parallel: 1
Output directory: /out/output
Working directory: /out/working
Log directory: /out/log
Remove working directory: True
Available memory: 1.0 (GB)
Available threads: 1
Number of threads for ANTs: 1
Traceback (most recent call last):
  File "/code/run.py", line 827, in <module>
    run_main()
  File "/code/run.py", line 726, in run_main
    data_hash = hash_data_config(sub_list)
  File "/code/CPAC/utils/configuration/yaml_template.py", line 364, in hash_data_config
    return sha1('_'.join([','.join([run.get(key, '') for run in sub_list]) for
  File "/code/CPAC/utils/configuration/yaml_template.py", line 364, in <listcomp>
    return sha1('_'.join([','.join([run.get(key, '') for run in sub_list]) for
TypeError: sequence item 0: expected str instance, int found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/code/run.py", line 831, in <module>
    failed_to_start(sys.argv[2] if len(sys.argv) > 2 else os.getcwd(),
  File "/code/CPAC/utils/monitoring/custom_logging.py", line 41, in failed_to_start
    logger.exception('C-PAC failed to start')
  File "/code/CPAC/utils/monitoring/custom_logging.py", line 152, in exception
    return self.error(msg, *args, exc_info=exc_info, **kwargs)
  File "/code/CPAC/utils/monitoring/custom_logging.py", line 164, in _log
    with open(self.handlers[0].baseFilename, 'a',
NotADirectoryError: [Errno 20] Not a directory: '/data_config.yml/failedToStart.log'

The first error seems to related to the numbers of subject_id and unique_id, but don't know why C-PAC composed such a path /data_config.yml/failedToStart.log.

In response to the first exception, I changed those IDs such that they are str instead of int:

- site: site-1
  subject_id: s1
  unique_id: s2
  derivatives_dir: /fprep/sub-0151

Then I got the exact same issue as the attactment logs:

Traceback (most recent call last):
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 1514, in build_workflow
    wf = connect_pipeline(wf, cfg, rpool, pipeline_blocks)
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 1127, in connect_pipeline
    wf = nb.connect_block(wf, cfg, rpool)
  File "/code/CPAC/pipeline/engine.py", line 1499, in connect_block
    for pipe_idx, strat_pool in rpool.get_strats(
  File "/code/CPAC/pipeline/engine.py", line 558, in get_strats
    raise LookupError('\n\n[!] C-PAC says: None of the listed '
LookupError: When trying to connect node block 'ingress_regressors' to workflow 'cpac_s1_s2' after node block 'nuisance_regressors_generation_T1w':

[!] C-PAC says: None of the listed resources in the node block being connected exist in the resource pool.

Resources:
['pipeline-ingress_desc-confounds_timeseries']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 457, in run_workflow
    workflow = build_workflow(
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 1530, in build_workflow
    missing_key = errorstrings[errorstrings.index(errorstring) + 1]
ValueError: '[!] C-PAC says: None of the listed resources in the node block being connected exist in the resource pool.\n\nResources:' is not in list

At least in my current tests, the custom data config file do not solve this issue.

tamsinrogers commented 2 months ago

Hi @suxpert thank you for the update and for providing some more information!

Regarding the issue with your data_config.yml, the error seems to be related to the numbers of subject_id and unique_id. C-PAC is expecting strings, not ints here, so you would need something like:

- site: site-1
 subject_id: "0151"
unique_id: $SESSION_NAME
 derivatives_dir: /fprep/sub-0151

or

- site: site-1
 subject_id: sub-0151
 unique_id: ses-$SESSION_NAME
 derivatives_dir: /fprep/sub-0151

to make sure those values are being read in as strings. The subject and session IDs in the data config also need to match the ones in the data.

Regarding your question about the /data_config.yml/failedToStart.log path, C-PAC expects the obligatory BIDS App arguments before the optional arguments, so when there’s a build error, it’s just grabbing the second postional arg for the output directory. Moving the positional args to the front like

docker run -it --rm -u $(id -u)                     \
    -v $BIDS:/bids:ro                               \
    -v $BIDS/derivatives/fprep-24.0.1:/fprep:ro     \
    -v $PWD/data_config.yml:/data_config.yml:ro     \
    -v $BIDS/derivatives/cpca-1.8.7:/out            \
    fcpindi/c-pac:release-v1.8.7                    \
    /fprep /out test_config                         \
    --data-config-file /data_config.yml             \
    --preconfig fmriprep-ingress --tracking-opt-out

instead of

docker run -it --rm -u $(id -u)                     \
    -v $BIDS:/bids:ro                               \
    -v $BIDS/derivatives/fprep-24.0.1:/fprep:ro     \
    -v $PWD/data_config.yml:/data_config.yml:ro     \
    -v $BIDS/derivatives/cpca-1.8.7:/out            \
    fcpindi/c-pac:release-v1.8.7                    \
    --data-config-file /data_config.yml             \
    --preconfig fmriprep-ingress --tracking-opt-out \
    /fprep /out test_config

should fix that issue.

suxpert commented 2 months ago

I've tried for the new data_config.yml like this:

- site: site-1
  subject_id: sub-0151
  unique_id: ses-1
  derivatives_dir: /fprep/sub-0151

with args re-ordered, I still got the same error:

Traceback (most recent call last):
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 1514, in build_workflow
    wf = connect_pipeline(wf, cfg, rpool, pipeline_blocks)
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 1127, in connect_pipeline
    wf = nb.connect_block(wf, cfg, rpool)
  File "/code/CPAC/pipeline/engine.py", line 1499, in connect_block
    for pipe_idx, strat_pool in rpool.get_strats(
  File "/code/CPAC/pipeline/engine.py", line 558, in get_strats
    raise LookupError('\n\n[!] C-PAC says: None of the listed '
LookupError: When trying to connect node block 'ingress_regressors' to workflow 'cpac_sub-0151_ses-1' after node block 'nuisance_regressors_generation_T1w':

[!] C-PAC says: None of the listed resources in the node block being connected exist in the resource pool.

Resources:
['pipeline-ingress_desc-confounds_timeseries']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 457, in run_workflow
    workflow = build_workflow(
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 1530, in build_workflow
    missing_key = errorstrings[errorstrings.index(errorstring) + 1]
ValueError: '[!] C-PAC says: None of the listed resources in the node block being connected exist in the resource pool.\n\nResources:' is not in list

You've mentioned that

The subject and session IDs in the data config also need to match the ones in the data.

So am I understanding correctly, that the subject_id should match the participant folder exactly, while unique_id equals to session folder in BIDS? Then what does the site-X stands for?

Since in BIDS, the session folder is optional, in my case, the BIDS are look like this:

$ tree -L 2 bids
bids
β”œβ”€β”€ ChangeLog
β”œβ”€β”€ dataset_description.json
β”œβ”€β”€ derivatives
β”‚Β Β  β”œβ”€β”€ cpac-1.8.7
β”‚Β Β  β”œβ”€β”€ fprep-24.0.1
β”‚Β Β  └── xcpd-0.8.2
β”œβ”€β”€ participants.json
β”œβ”€β”€ participants.tsv
β”œβ”€β”€ README.md
β”œβ”€β”€ sub-0151
β”‚Β Β  β”œβ”€β”€ anat
β”‚Β Β  β”œβ”€β”€ fmap
β”‚Β Β  └── func
β”œβ”€β”€ sub-0156
β”‚Β Β  β”œβ”€β”€ anat
β”‚Β Β  β”œβ”€β”€ fmap
β”‚Β Β  └── func
β”œ...

So I tried setting unique_id to empty:

- site: site-1
  subject_id: sub-0151
  unique_id: ""
  derivatives_dir: /fprep/sub-0151

But the issue persists!

Traceback (most recent call last):
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 1514, in build_workflow
    wf = connect_pipeline(wf, cfg, rpool, pipeline_blocks)
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 1127, in connect_pipeline
    wf = nb.connect_block(wf, cfg, rpool)
  File "/code/CPAC/pipeline/engine.py", line 1499, in connect_block
    for pipe_idx, strat_pool in rpool.get_strats(
  File "/code/CPAC/pipeline/engine.py", line 558, in get_strats
    raise LookupError('\n\n[!] C-PAC says: None of the listed '
LookupError: When trying to connect node block 'ingress_regressors' to workflow 'cpac_sub-0151_' after node block 'nuisance_regressors_generation_T1w':

[!] C-PAC says: None of the listed resources in the node block being connected exist in the resource pool.

Resources:
['pipeline-ingress_desc-confounds_timeseries']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 457, in run_workflow
    workflow = build_workflow(
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 1530, in build_workflow
    missing_key = errorstrings[errorstrings.index(errorstring) + 1]
ValueError: '[!] C-PAC says: None of the listed resources in the node block being connected exist in the resource pool.\n\nResources:' is not in list

NOTE that C-PAC composed folder cpac_sub-0151_ with a tailing underscore, is this why C-PAC failed finding the *desc-confounds files?

suxpert commented 2 months ago

I tried one more time from scratch, that the BIDS contains session folder, like this:

$ tree -L 3 bids
bids
β”œβ”€β”€ ChangeLog
β”œβ”€β”€ dataset_description.json
β”œβ”€β”€ derivatives
β”‚Β Β  β”œβ”€β”€ cpac-1.8.7
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ log
β”‚Β Β  β”‚Β Β  └── working
β”‚Β Β  └── fprep-24.0.1
β”‚Β Β      β”œβ”€β”€ dataset_description.json
β”‚Β Β      β”œβ”€β”€ logs
β”‚Β Β      β”œβ”€β”€ sub-0151
β”‚Β Β      └── sub-0151.html
β”œβ”€β”€ participants.json
β”œβ”€β”€ participants.tsv
β”œβ”€β”€ README.md
└── sub-0151
    └── ses-01
        β”œβ”€β”€ anat
        β”œβ”€β”€ fmap
        └── func

13 directories, 7 files

fmriprep works as expected, and result in such derivatives:

$ tree -L 3 bids/derivatives/fprep-24.0.1
bids/derivatives/fprep-24.0.1
β”œβ”€β”€ dataset_description.json
β”œβ”€β”€ logs
β”‚Β Β  β”œβ”€β”€ CITATION.bib
β”‚Β Β  β”œβ”€β”€ CITATION.html
β”‚Β Β  β”œβ”€β”€ CITATION.md
β”‚Β Β  └── CITATION.tex
β”œβ”€β”€ sub-0151
β”‚Β Β  β”œβ”€β”€ figures
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_acq-highres_desc-conform_T1w.html
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_acq-highres_dseg.svg
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_acq-highres_space-MNI152NLin2009cAsym_T1w.svg
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_desc-about_T1w.html
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_desc-summary_T1w.html
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_task-rest_run-01_desc-carpetplot_bold.svg
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_task-rest_run-01_desc-compcorvar_bold.svg
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_task-rest_run-01_desc-confoundcorr_bold.svg
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_task-rest_run-01_desc-coreg_bold.svg
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_task-rest_run-01_desc-rois_bold.svg
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_task-rest_run-01_desc-summary_bold.html
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_task-rest_run-01_desc-validation_bold.html
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_task-rest_run-02_desc-carpetplot_bold.svg
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_task-rest_run-02_desc-compcorvar_bold.svg
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_task-rest_run-02_desc-confoundcorr_bold.svg
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_task-rest_run-02_desc-coreg_bold.svg
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_task-rest_run-02_desc-rois_bold.svg
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ sub-0151_ses-01_task-rest_run-02_desc-summary_bold.html
β”‚Β Β  β”‚Β Β  └── sub-0151_ses-01_task-rest_run-02_desc-validation_bold.html
β”‚Β Β  β”œβ”€β”€ log
β”‚Β Β  β”‚Β Β  └── 20240904-082316_3fa9555a-0668-412d-916b-a2742c26fd9e
β”‚Β Β  └── ses-01
β”‚Β Β      β”œβ”€β”€ anat
β”‚Β Β      └── func
└── sub-0151.html

9 directories, 25 files

The required columns DO exists:

$ head -1 bids/derivatives/fprep-24.0.1/**/*_desc-confounds_timeseries.tsv | tr '\t' '\n' | grep -n '\<global_signal\>\|\<white_matter\>'
2:global_signal
10:white_matter
318:global_signal
326:white_matter

Then with data_config.yml be:

- site: site-1
  subject_id: sub-0151
  unique_id: ses-01
  derivatives_dir: /fprep/sub-0151/ses-01

or

- site: site-1
  subject_id: sub-0151
  unique_id: ses-01
  derivatives_dir: /fprep/sub-0151

and with command

docker run -it --rm -u $(id -u)                   \
    -v $BIDS/derivatives/fprep-24.0.1:/fprep:ro   \
    -v $BIDS/data_config.yml:/data_config.yml:ro  \
    -v $BIDS/derivatives/cpac-1.8.7:/out          \
    fcpindi/c-pac:release-v1.8.7                  \
    /fprep /out test_config                       \
    --data-config-file /data_config.yml           \
    --preconfig fmriprep-ingress --tracking-opt-out

I always get the same error as previously reported:

Traceback (most recent call last):
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 1514, in build_workflow
    wf = connect_pipeline(wf, cfg, rpool, pipeline_blocks)
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 1127, in connect_pipeline
    wf = nb.connect_block(wf, cfg, rpool)
  File "/code/CPAC/pipeline/engine.py", line 1499, in connect_block
    for pipe_idx, strat_pool in rpool.get_strats(
  File "/code/CPAC/pipeline/engine.py", line 558, in get_strats
    raise LookupError('\n\n[!] C-PAC says: None of the listed '
LookupError: When trying to connect node block 'ingress_regressors' to workflow 'cpac_sub-0151_ses-01' after node block 'nuisance_regressors_generation_T1w':

[!] C-PAC says: None of the listed resources in the node block being connected exist in the resource pool.

Resources:
['pipeline-ingress_desc-confounds_timeseries']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 457, in run_workflow
    workflow = build_workflow(
  File "/code/CPAC/pipeline/cpac_pipeline.py", line 1530, in build_workflow
    missing_key = errorstrings[errorstrings.index(errorstring) + 1]
ValueError: '[!] C-PAC says: None of the listed resources in the node block being connected exist in the resource pool.\n\nResources:' is not in list
tamsinrogers commented 2 months ago

Hi @suxpert,

Thank you for your response. We are replicating this issue on our end and will get back to you.

subject_id is the participant folder. In regard to site-X, if you run C-PAC without a data config, you can pass a directory containing BIDS dataset directories as the bids_dir positional argument and C-PAC will interpret each top-level BIDS dataset directory name as a site name. It is relevant if you are doing that in addition to doing slice-timing correction and specifying slice-timing parameters in a CSV. site-X is also relevant when putting the site name in the output filenames.