Open jcohenadad opened 3 years ago
So a couple thoughts. First I think it would be great to have a go-to pipeline system. It used to be PSOM before we moved to python. This go-to has not materialized, for a number of reasons.
There are just a ridiculous number of pipeline engines out there https://github.com/pditommaso/awesome-pipeline Solutions that have been discussed in simexp recently include:
Note that fMRIprep is not a pipeline tool. It is a particular pipeline built on top of nipype. And it's using nipype capabilities to a very limited degree (no inter- subject slurm parallelization). Nipype is primarily a collection of interfaces for neuroimaging tools, and its pipeline engine is not particularly good. My understanding is that this project is to be superseded by pydra.
Another alternative is to implement a library that take advantage of multicore using joblib - used by scikit-learn and nilearn amongst others - and then distribute individual subject analysis using a basic bash script and slurm. That's the approach we've used for dypac.
Currently at simexp it looks like every project converges towards a different solution for their use case, which may be why there are so many different pipeline projects in the world.... Also our general philosophy is to contribute to existing projects rather than start our own, so in effect we are constrained by choices made by others. Agah's qmrflow project is pretty much our only stand-alone neuroimaging pipeline project. Projects like dypac or load_confounds are designed as "seeds" to be merged down the road in lager project - if successful.
So in conclusion, I am not sure I can make a recommendation at this stage. Happy to hear what others think.
thank you for chipping in @pbellec !
Note that fMRIprep is not a pipeline tool. It is a particular pipeline built on top of nipype.
Thank you for clarifying.
Also our general philosophy is to contribute to existing projects rather than start our own, so in effect we are constrained by choices made by others. Agah's qmrflow project is pretty much our only stand-alone neuroimaging pipeline project.
IIUC @agahkarakuzu 's pipeline is a pure nextflow pipeline, so it uses an existing technology rather than starting our own one. There are just dockerized elements, but the backbone is still a pure nextflow pipeline.
IMHO the strategy regarding pipeline(s) (and programming language and software in general) at the scale of a lab/team is to minimize the amount of technologies/software/language because:
So, given that:
My suggestion would be to endorse it, make it official, and ask people to start learning it.
@pbellec I wanted to add a few Nextflow improvements that came with DSL2:
Workflows and processes as modules: The previous version allowed one compact script that contains all the processes and workflows per pipeline. The new approach brings great modularity to recycle existing workflow/process & components.
Managing data channel forks became much more easier, expressive and intuitive.
They improved executor abstraction to a new level, it is not limited to Singularity/Docker, now it can even be used with kubernetes.
Nextflow is also good for deeper (e.g. socket programming) integrations, as it is built on unix pipes. The data stream is managed at the depths of undertow, not just on the surface to keep our boat afloat :)
Multiple container orchestration ability (1 container/1 process) was already there, but managing them also became easier.
Brain processing pipeline is now standalone and fully switched to Nextflow DSL2 for modularity.
Context
There are a few pipeline solutions out there:
Problem
There is no clear guideline about what pipeline technology to use for the neuromod project. Also, most of these pipelines require some time to learn to be used. Consequently, if one neuromod sub-team uses technology X and another sub-team uses technology Y, this might create unnecessary complications:
Current situation
Based on Slack discussions which started in ~dec 2019, NextFlow was put on the table by @arnaudbore as a possible solution for pipeline. There was no opposition to this idea, consequently, @agahkarakuzu started go with NextFlow and created a pipeline for this repository.
In parallel, other sub-teams have been using fmriprep as a pipeline technology (based on @bpinsard comments on Slack).
Proposed Solution
The two pipeline technologies supported for neuromod project are:
Moving forward
If the core team agrees with the proposed solution, I suggest to add this information under the contributing section of the neuromod project. Related to https://github.com/courtois-neuromod/cneuromod.ca/issues/13
@pbellec @bpinsard @arnaudbore feedback needed