bids-apps / CPAC

BIDS Application for the Configurable Pipeline for the Analysis of Connectomes (C-PAC)
Apache License 2.0
14 stars 18 forks source link
bids bidsapp mri preprocessing

C-PAC BIDS Application

Documentation

Extensive information can be found in the C-PAC User Guide.

Description

The Configurable Pipeline for the Analysis of Connectomes C-PAC is a software for performing high-throughput preprocessing and analysis of functional connectomes data using high-performance computers. C-PAC is implemented in Python using the Nipype pipelining [1] library to efficiently combine tools from AFNI [2], ANTS [3], and FSL [4] to achieve high quality and robust automated processing. This docker container, when built, is an application for performing participant level analyses. Future releases will include group-level analyses, when there is a BIDS standard for handling derivatives and group models.

A note about versioning

C-PAC BIDS Apps version tags are composed of the C-PAC version followed by an underscore and then the version of the container. The container version restarts for every new C-PAC version and is a single integer that reflects the modification number of the build. For example v1.0.1a_5 corresponds to the 5th build of the container for C-PAC version v1.0.1a.

Usage notes

  1. You can either perform a custom processing using a YAML configuration file or use the default processing pipeline. A GUI can be invoked to assist in pipeline custimization by specifying GUI command line arguement (this currently only works for Singularity containers).

  2. The default behavior is to read in data that is organized in the BIDS format. This includes data that is in Amazon AWS S3 by using the format s3://<bucket_name>/<bids_dir> for the bids_dir command line argument. Outputs can be written to S3 using the same format for the output_dir. Credentials for accessing these buckets can be specified on the command line (using --aws_input_creds or --aws_output_creds).

  3. Non-BIDS organized data can processed using a C-PAC data configuration yaml file. This file can be generated using the C-PAC GUI (start the app with the GUI argument, also see instructions below) or can be created using other means, please refer to CPAC documentation for more information.

  4. When the app is run, a data configuration file is written to the working directory. This file can be passed into subsequent runs, which avoids the overhead of re-parsing the BIDS input directory on each run (i.e. for cluster or cloud runs). These files can be generated without executing the C-PAC pipeline using the test_run command line argument.

  5. The participant_label and participant_ndx arguments allow the user to specify which of the many datasets should be processed, this are useful when parallelizing the run of multiple participants.

Default configuration

The default processing pipeline performs fMRI processing using four strategies, with and without global signal regression, with and without bandpass filtering.

Anatomical processing begins with conforming the data to RPI orientation and removing orientation header information that will interfere with further processing. A non-linear transform between skull-on images and a 2mm MNI brain-only template are calculated using ANTs [3]. Images are them skull-stripped using AFNI's 3dSkullStrip [5] and subsequently segmented into WM, GM, and CSF using FSL’s fast tool [6]. The resulting WM mask was multiplied by a WM prior map that was transformed into individual space using the inverse of the linear transforms previously calculated during the ANTs procedure. A CSF mask was multiplied by a ventricle map derived from the Harvard-Oxford atlas distributed with FSL [4]. Skull-stripped images and grey matter tissue maps are written into MNI space at 2mm resolution.

Functional preprocessing begins with resampling the data to RPI orientation, and slice timing correction. Next, motion correction is performed using a two-stage approach in which the images are first coregistered to the mean fMRI and then a new mean is calculated and used as the target for a second coregistration (AFNI 3dvolreg [2]). A 7 degree of freedom linear transform between the mean fMRI and the structural image is calculated using FSL’s implementation of boundary-based registration [7]. Nuisance variable regression (NVR) is performed on motion corrected data using a 2nd order polynomial, a 24-regressor model of motion [8], 5 nuisance signals, identified via principal components analysis of signals obtained from white matter (CompCor, [9]), and mean CSF signal. WM and CSF signals were extracted using the previously described masks after transforming the fMRI data to match them in 2mm space using the inverse of the linear fMRI-sMRI transform. The NVR procedure is performed twice, with and without the inclusion of the global signal as a nuisance regressor. The residuals of the NVR procedure are processed with and without bandpass filtering (0.001Hz < f < 0.1Hz), written into MNI space at 3mm resolution and subsequently smoothed using a 6mm FWHM kernel.

Several different individual level analysis are performed on the fMRI data including:

Usage

This App has the following command line arguments:

usage: run.py [-h] [--pipeline_file PIPELINE_FILE]
              [--data_config_file DATA_CONFIG_FILE]
              [--aws_input_creds AWS_INPUT_CREDS]
              [--aws_output_creds AWS_OUTPUT_CREDS] [--n_cpus N_CPUS]
              [--mem_mb MEM_MB] [--mem_gb MEM_GB] [--save_working_dir]
              [--participant_label PARTICIPANT_LABEL [PARTICIPANT_LABEL ...]]
              [--participant_ndx PARTICIPANT_NDX]
              bids_dir output_dir {participant,group,test_config,GUI}

C-PAC Pipeline Runner

positional arguments:
  bids_dir              The directory with the input dataset formatted
                        according to the BIDS standard. Use the format
                        s3://bucket/path/to/bidsdir to read data directly from
                        an S3 bucket. This may require AWS S3 credentials
                        specificied via the --aws_input_creds option.
  output_dir            The directory where the output files should be stored.
                        If you are running group level analysis this folder
                        should be prepopulated with the results of the
                        participant level analysis. Us the format
                        s3://bucket/path/to/bidsdir to write data directly to
                        an S3 bucket. This may require AWS S3 credentials
                        specificied via the --aws_output_creds option.
  {participant,group,test_config,GUI}
                        Level of the analysis that will be performed. Multiple
                        participant level analyses can be run independently
                        (in parallel) using the same output_dir. GUI will open
                        the CPAC gui (currently only works with singularity)
                        and test_config will run through the entire
                        configuration process but will not execute the
                        pipeline.

optional arguments:
  -h, --help            show this help message and exit
  --pipeline_file PIPELINE_FILE
                        Name for the pipeline configuration file to use
  --data_config_file DATA_CONFIG_FILE
                        Yaml file containing the location of the data that is
                        to be processed. Can be generated from the CPAC gui.
                        This file is not necessary if the data in bids_dir is
                        organized according to the BIDS format. This enables
                        support for legacy data organization and cloud based
                        storage. A bids_dir must still be specified when using
                        this option, but its value will be ignored.
  --aws_input_creds AWS_INPUT_CREDS
                        Credentials for reading from S3. If not provided and
                        s3 paths are specified in the data config we will try
                        to access the bucket anonymously
  --aws_output_creds AWS_OUTPUT_CREDS
                        Credentials for writing to S3. If not provided and s3
                        paths are specified in the output directory we will
                        try to access the bucket anonymously
  --n_cpus N_CPUS       Number of execution resources available for the
                        pipeline
  --mem_mb MEM_MB       Amount of RAM available to the pipeline in megabytes.
                        Included for compatibility with BIDS-Apps standard,
                        but mem_gb is preferred
  --mem_gb MEM_GB       Amount of RAM available to the pipeline in gigabytes.
                        if this is specified along with mem_mb, this flag will
                        take precedence.
  --save_working_dir    Save the contents of the working directory.
  --participant_label PARTICIPANT_LABEL [PARTICIPANT_LABEL ...]
                        The label of the participant that should be analyzed.
                        The label corresponds to sub-<participant_label> from
                        the BIDS spec (so it does not include "sub-"). If this
                        parameter is not provided all subjects should be
                        analyzed. Multiple participants can be specified with
                        a space separated list. To work correctly this should
                        come at the end of the command line
  --participant_ndx PARTICIPANT_NDX
                        The index of the participant that should be analyzed.
                        This corresponds to the index of the participant in
                        the subject list file. This was added to make it
                        easier to accomodate SGE array jobs. Only a single
                        participant will be analyzed. Can be used with
                        participant label, in which case it is the index into
                        the list that follows the particpant_label flag.

To run it in participant level mode (for one participant):

docker run -i --rm \
    -v /tmp:/scratch \
    -v /Users/filo/data/ds005:/bids_dataset \
    -v /Users/filo/outputs:/outputs \
    bids/cpac \
    /bids_dataset /outputs participant --participant_label 01

Running the GUI (requires mapping your X socket)

Running docker container on Linux

  1. Start the docker container, mapping the X socket (change /Users/filo to a local directory on your computer)
    docker run -i --rm \
        --privileged \
        -e DISPLAY=$DISPLAY \
        -v /tmp/.X11-unix:/tmp/.X11-unix \
        -v /tmp:/scratch \
        -v /Users/filo/data/ds005:/bids_dataset \
        -v /Users/filo/outputs:/outputs \
        bids/cpac \
        /bids_dataset /outputs GUI

Running docker container on Mac OSX

  1. Install XQuartz

  2. Start XQuartz (from terminal)

    open -a XQuartz
  1. Enable XQuartz connections from network clients

XQuartz -> preferences -> security -> "Allow connections from network clients"

  1. Get your ip address (e.g., might have to change eth0 to match the name of your network interface.)
    ip=$(ifconfig en0 | grep inet | awk '$1=="inet" {print $2}')
  1. Tell xhost to accept connections from the localhost
xhost + ${ip}
  1. Start the docker container, mapping the X socket (change /Users/filo to a local directory on your computer)
    docker run -i --rm \
        --privileged \
        -e DISPLAY=$ip:0 \
        -v /tmp/.X11-unix:/tmp/.X11-unix \
        -v /tmp:/scratch \
        -v /Users/filo/data/ds005:/bids_dataset \
        -v /Users/filo/outputs:/outputs \
        bids/cpac \
        /bids_dataset /outputs GUI

Running singularity container on Linux

  1. Start the docker container (it just works!, provided you change /Users/filo to a local directory on your computer)
    singularity run \
        -B /home/ubuntu:/mnt \
        -B /mnt:/scratch \
        -B /Users/filo/data/ds005:/bids_dataset \
        -B /Users/filo/outputs:/outputs \
        /home/ubuntu/workspace/container_build/singularity_images/cpac_latest.img \
        /bids_dataset \
        /outputs\
        GUI

To convert the Docker container to a Singularity container :

docker run --privileged -ti --rm  \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /home/srycajal/singularity_images:/output \
    filo/docker2singularity \
    bids/cpac

Example submit script for running as a Singularity container on sun grid engine:

#! /bin/bash
## SGE batch file - bgsp
#$ -S /bin/bash
## bgsp is the jobname and can be changed
#$ -N bgsp
## execute the job using the mpi_smp parallel enviroment and 8 cores per job
#$ -pe mpi_smp 8
## create an array of 1112 jobs
#$ -t 1-1112
#$ -V
## change the following working directory to a persistent directory that is
## available on all nodes, this is were messages printed by the app (stdout
## and stderr) will be stored
#$ -wd /home/ubuntu/workspace/cluster_files

sudo chmod 777 /mnt
mkdir -p /mnt/log/reports

sge_ndx=$(( SGE_TASK_ID - 1 ))

# random sleep so that jobs dont start at _exactly_ the same time
sleep $(( $SGE_TASK_ID % 10 ))

singularity run -B /home/ubuntu:/mnt -B /mnt:/scratch \
  /home/ubuntu/workspace/container_build/singularity_images/cpac_latest.img \
  --n_cpus 8 --mem 12 \
  --aws_input_creds /mnt/workspace/cluster_files/s3-keys.csv \
  --aws_output_creds /mnt/workspace/cluster_files/s3-keys.csv \
  --data_config_file /mnt/workspace/cluster_files/bgsp_data_config.yml \
  s3://fcp-indi/data/Projects/BrainGenomicsSuperstructProject/orig_bids/ \
  s3://fcp-indi/data/Projects/BrainGenomicsSuperstructProject/cpac_out/ \
  participant --participant_ndx ${sge_ndx}

Notes:

  1. With the exception of your home directory, which is mounted from the local filesystem, the filesystem in Singularity containers is read-only. Files can be easily transferred in and out of the container by mapping local directories to directories inside the container using the -B from:to command line argument, where the from dir is mapped to to. When using mapped directories, remember that the paths specified on the command line are in relation to the directory inside the container (e.g. the to directory).

  2. Unless the --save_working_dir flag is set, the C-PAC app will use the /scratch directory for intermediary files. Since this directory is write protected, a directory from the local filesystem must be mapped to /scratch for the pipeline to run successfully. This directory should be large enough to hold all of the intermediary files for the datasets that are processed in parallel, as a rule of thumb we suggest 3 GB per dataset. Unless the --save_working_dir flag is set, the working directory will be deleted when the pipeline has completed.

  3. Use the --save_working_dir flag to retain all intermediary files, which can be useful for debugging. In this case, the intermediary files will be saved in the working_dir subdirectory of the user specified output directory. This will require about 3GB per dataset, but may require more for multiple or very long fMRI scans.

Reporting errors and getting help

Please report errors on the C-PAC github page issue tracker. Please use the C-PAC google group for help using C-PAC and this application.

Acknowledgements

We currently have a publication in preparation, in the meantime please cite our poster from INCF:

Craddock C, Sikka S, Cheung B, Khanuja R, Ghosh SS, Yan C, Li Q, Lurie D, Vogelstein J, Burns R, Colcombe S,
Mennes M, Kelly C, Di Martino A, Castellanos FX and Milham M (2013). Towards Automated Analysis of Connectomes:
The Configurable Pipeline for the Analysis of Connectomes (C-PAC). Front. Neuroinform. Conference Abstract:
Neuroinformatics 2013. doi:10.3389/conf.fninf.2013.09.00042

@ARTICLE{cpac2013,
    AUTHOR={Craddock, Cameron  and  Sikka, Sharad  and  Cheung, Brian  and  Khanuja, Ranjeet  and  Ghosh, Satrajit S
        and Yan, Chaogan  and  Li, Qingyang  and  Lurie, Daniel  and  Vogelstein, Joshua  and  Burns, Randal  and
        Colcombe, Stanley  and  Mennes, Maarten  and  Kelly, Clare  and  Di Martino, Adriana  and  Castellanos,
        Francisco Xavier  and  Milham, Michael},
    TITLE={Towards Automated Analysis of Connectomes: The Configurable Pipeline for the Analysis of Connectomes (C-PAC)},
    JOURNAL={Frontiers in Neuroinformatics},
    YEAR={2013},
    NUMBER={42},
    URL={http://www.frontiersin.org/neuroinformatics/10.3389/conf.fninf.2013.09.00042/full},
    DOI={10.3389/conf.fninf.2013.09.00042},
    ISSN={1662-5196}
}

References

1. Gorgolewski, K., Burns, C.D., Madison, C., Clark, D., Halchenko, Y.O., Waskom, M.L., Ghosh, S.S.: Nipype: A flexible, lightweight and extensible neuroimaging data processing framework in python. Front. Neuroinform. 5 (2011). doi:10.3389/fninf.2011.00013

2. Cox, R.W., Jesmanowicz, A.: Real-time 3d image registration for functional mri. Magn Reson Med 42(6), 1014–8 (1999)

3. Avants, B., Epstein, C., Grossman, M., Gee, J.: Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain. Medical Image Analysis 12(1), 26–41 (2008). doi:10.1016/j.media.2007.06.004

4. Smith, S.M., Jenkinson, M., Woolrich, M.W., Beckmann, C.F., Behrens, T.E.J., Johansen-Berg, H., Bannister, P.R., Luca, M.D., Drobnjak, I., Flitney, D.E., Niazy, R.K., Saunders, J., Vickers, J., Zhang, Y., Stefano, N.D., Brady, J.M., Matthews, P.M.: Advances in functional and structural mr image analysis and implementation as fsl. NeuroImage 23, 208–219 (2004). doi:10.1016/j.neuroimage.2004.07.051

5. Smith, S.M.: Fast robust automated brain extraction. Human Brain Mapping 17(3), 143–155 (2002). doi:10.1002/hbm.10062

6. Zhang, Y., Brady, M., Smith, S.: Segmentation of brain mr images through a hidden markov random field model and the expectation-maximization algorithm. IEEE Transactions on Medical Imaging 20(1), 45–57 (2001). doi:10.1109/42.906424

7. Greve, D.N., Fischl, B.: Accurate and robust brain image alignment using boundary-based registration. NeuroImage 48(1), 63–72 (2009). doi:10.1016/j.neuroimage.2009.06.060

8. Friston, K.J., Williams, S., Howard, R., Frackowiak, R.S., Turner, R.: Movement-related effects in fmri time-series. Magn Reson Med 35(3), 346–55 (1996)

9. Behzadi, Y., Restom, K., Liau, J., Liu, T.T.: A component based noise correction method (compcor) for bold and perfusion based fmri. NeuroImage 37(1), 90–101 (2007). doi:10.1016/j.neuroimage.2007.04.042

10. Zang, Y.-F., He, Y., Zhu, C.-Z., Cao, Q.-J., Sui, M.-Q., Liang, M., Tian, L.-X., et al. (2007). Altered baseline brain activity in children with ADHD revealed by resting-state functional MRI. Brain & development, 29(2), 83–91.

11. Zou, Q.-H., Zhu, C.-Z., Yang, Y., Zuo, X.-N., Long, X.-Y., Cao, Q.-J., Wang, Y.-F., et al. (2008). An improved approach to detection of amplitude of low-frequency fluctuation (ALFF) for resting-state fMRI: Fractional ALFF. Journal of neuroscience methods, 172(1), 137–141.

12. Zang, Y., Jiang, T., Lu, Y., He, Y., Tian, L., 2004. Regional homogeneity approach to fMRI data analysis. Neuroimage 22, 394-400.

13. Stark, D. E., Margulies, D. S., Shehzad, Z. E., Reiss, P., Kelly, A. M. C., Uddin, L. Q., Gee, D. G., et al. (2008). Regional variation in interhemispheric coordination of intrinsic hemodynamic fluctuations. The Journal of Neuroscience, 28(51), 13754–13764.

14. Buckner RL, Sepulcre J, Talukdar T, Krienen FM, Liu H, Hedden T, Andrews-Hanna JR, Sperling RA, Johnson KA. 2009. Cortical hubs revealed by intrinsic functional connectivity: mapping, assessment of stability, and relation to Alzheimer’s disease. J Neurosci. 29:1860–1873.

15. Lohmann G, Margulies DS, Horstmann A, Pleger B, Lepsien J, Goldhahn D, Schloegl H, Stumvoll M, Villringer A, Turner R. 2010. Eigenvector centrality mapping for analyzing connectivity patterns in fMRI data of the human brain. PLoS One. 5:e10232

16. Tomasi D, Volkow ND. 2010. Functional connectivity density mapping. PNAS. 107(21):9885-9890.

17. C.F. Beckmann, C.E. Mackay, N. Filippini, and S.M. Smith. Group comparison of resting-state FMRI data using multi-subject ICA and dual regression. OHBM, 2009.

18. Smith, S. M., Fox, P. T., Miller, K. L., Glahn, D. C., Fox, P. M., Mackay, C. E., et al. (2009). Correspondence of the brain’s functional architecture during activation and rest. Proceedings of the National Academy of Sciences of the United States of America, 106(31), 13040–13045. doi:10.1073/pnas.0905267106

19. Dosenbach, N. U. F., Nardos, B., Cohen, A. L., Fair, D. a, Power, J. D., Church, J. a, … Schlaggar, B. L. (2010). Prediction of individual brain maturity using fMRI. Science (New York, N.Y.), 329(5997), 1358–61. http://doi.org/10.1126/science.1194144

20. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., … Joliot, M. (2002). Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage, 15(1), 273–89. http://doi.org/10.1006/nimg.2001.0978

21. Eickhoff, S. B., Stephan, K. E., Mohlberg, H., Grefkes, C., Fink, G. R., Amunts, K., & Zilles, K. (2005). A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data. NeuroImage, 25(4), 1325–35. http://doi.org/10.1016/j.neuroimage.2004.12.034

22. Harvard-Oxford cortical and subcortical structural atlases, http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Atlases

23. Lancaster, J. L., Woldorff, M. G., Parsons, L. M., Liotti, M., Freitas, C. S., Rainey, L., … Fox, P. T. (2000). Automated Talairach atlas labels for functional brain mapping. Human Brain Mapping, 10(3), 120–31. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/10912591

24. Craddock, R. C., James, G. A., Holtzheimer, P. E., Hu, X. P., & Mayberg, H. S. (2011). A whole brain fMRI atlas generated via spatially constrained spectral clustering. Human Brain Mapping, 0(July 2010). http://doi.org/10.1002/hbm.21333