What analysis pipeline to use with the Neuromod project?

jcohenadad commented 3 years ago

Context

There are a few pipeline solutions out there:

Problem

There is no clear guideline about what pipeline technology to use for the neuromod project. Also, most of these pipelines require some time to learn to be used. Consequently, if one neuromod sub-team uses technology X and another sub-team uses technology Y, this might create unnecessary complications:

if sub-team collaborates with each other
if member of a sub-team using technology X needs a code review from sub-team using technology Y
documenting pipeline technology for external contributors

Current situation

Based on Slack discussions which started in ~dec 2019, NextFlow was put on the table by @arnaudbore as a possible solution for pipeline. There was no opposition to this idea, consequently, @agahkarakuzu started go with NextFlow and created a pipeline for this repository.

In parallel, other sub-teams have been using fmriprep as a pipeline technology (based on @bpinsard comments on Slack).

Proposed Solution

The two pipeline technologies supported for neuromod project are:

fmriprep
nextflow

Moving forward

If the core team agrees with the proposed solution, I suggest to add this information under the contributing section of the neuromod project. Related to https://github.com/courtois-neuromod/cneuromod.ca/issues/13

@pbellec @bpinsard @arnaudbore feedback needed

pbellec commented 3 years ago

So a couple thoughts. First I think it would be great to have a go-to pipeline system. It used to be PSOM before we moved to python. This go-to has not materialized, for a number of reasons.

There are just a ridiculous number of pipeline engines out there https://github.com/pditommaso/awesome-pipeline Solutions that have been discussed in simexp recently include:

pydra. PROS: awesome lightweight python based CONS: still very early in development. We experienced bugs and limitations which blocked us.
snakemake PROS: mature, large project. CONS: tons of dependencies.
nextflow is somewhat intermediary between pydra and snakemake. It's been used with success by Arnaud as part of his work with Maxime Descoteaux's lab.

Note that fMRIprep is not a pipeline tool. It is a particular pipeline built on top of nipype. And it's using nipype capabilities to a very limited degree (no inter- subject slurm parallelization). Nipype is primarily a collection of interfaces for neuroimaging tools, and its pipeline engine is not particularly good. My understanding is that this project is to be superseded by pydra.

Another alternative is to implement a library that take advantage of multicore using joblib - used by scikit-learn and nilearn amongst others - and then distribute individual subject analysis using a basic bash script and slurm. That's the approach we've used for dypac.

Currently at simexp it looks like every project converges towards a different solution for their use case, which may be why there are so many different pipeline projects in the world.... Also our general philosophy is to contribute to existing projects rather than start our own, so in effect we are constrained by choices made by others. Agah's qmrflow project is pretty much our only stand-alone neuroimaging pipeline project. Projects like dypac or load_confounds are designed as "seeds" to be merged down the road in lager project - if successful.

So in conclusion, I am not sure I can make a recommendation at this stage. Happy to hear what others think.

jcohenadad commented 3 years ago

thank you for chipping in @pbellec !

Note that fMRIprep is not a pipeline tool. It is a particular pipeline built on top of nipype.

Thank you for clarifying.

Also our general philosophy is to contribute to existing projects rather than start our own, so in effect we are constrained by choices made by others. Agah's qmrflow project is pretty much our only stand-alone neuroimaging pipeline project.

IIUC @agahkarakuzu 's pipeline is a pure nextflow pipeline, so it uses an existing technology rather than starting our own one. There are just dockerized elements, but the backbone is still a pure nextflow pipeline.

IMHO the strategy regarding pipeline(s) (and programming language and software in general) at the scale of a lab/team is to minimize the amount of technologies/software/language because:

each technology comes with a specific domain-knowledge;
knowledge can be disseminated amongst the maximum of people within a team/lab, rather than being concentrated within few individuals;

So, given that:

there is no "endorsed" pipeline technology so far for neuromod;
nextflow is already known by @arnaudbore and @agahkarakuzu
nextflow is actively used by a friend-lab (SCIL)

My suggestion would be to endorse it, make it official, and ask people to start learning it.

agahkarakuzu commented 3 years ago

@pbellec I wanted to add a few Nextflow improvements that came with DSL2:

Workflows and processes as modules: The previous version allowed one compact script that contains all the processes and workflows per pipeline. The new approach brings great modularity to recycle existing workflow/process & components.
Managing data channel forks became much more easier, expressive and intuitive.
They improved executor abstraction to a new level, it is not limited to Singularity/Docker, now it can even be used with kubernetes.
Nextflow is also good for deeper (e.g. socket programming) integrations, as it is built on unix pipes. The data stream is managed at the depths of undertow, not just on the surface to keep our boat afloat :)

Multiple container orchestration ability (1 container/1 process) was already there, but managing them also became easier.

agahkarakuzu commented 2 years ago

Brain processing pipeline is now standalone and fully switched to Nextflow DSL2 for modularity.

courtois-neuromod / anat-processing