Open glatard opened 7 years ago
So, are we thinking about how to encapsulate and formalize pipelines in Boutiques? This is just a thought, but perhaps pipeline descriptors should be separate from application descriptors (i.e. a different type), and the pipeline descriptor instead refers to Boutiques json descriptors of the applications within the pipeline (which e.g. have their own invocation schemas). Then the logic of how to connect the pieces (e.g. where to send the files, commands to run between stages, what to input for the next stage, which should be parallelized, which must wait for which) can be in the pipeline descriptor. Obviously that adds a lot of (rather immense) complexity ... but it would be clean and modular.
In any case, I think the protocol outlined in the current sec. 2.7 is very sensible; I'm just not clear on the details (e.g. for submitting multiple tasks and having some tasks wait for others, would most of the logic have to reside in the command line? Or in some other object?).
The pipeline logic won't be described in Boutiques. Some pipeline engines may decide to support Boutiques tools in their own pipeline but this will remain separate from the Boutiques' spec, as you noted.
We only aim at specifying an interface so that pipeline engines that are wrapped as Boutiques tools have a chance to submit new tasks to the platform to benefit from parallelization. This is what is described in section 2.7 of the paper. This is already implemented in CBRAIN ("sub-tasking mechanism"), it used for PSOM and FSL melodic+randomize, but it is not properly formalized in the Boutiques spec yet.
Specify how Boutiques applications may submit sub-tasks. Add the required properties to the schema. The existing "sub-tasking" mechanism in CBRAIN is a starting point but it doesn't comply to the invocation schema and it can be used by non-Boutiques tools. See current sections 2.6 and 2.7 of the paper. In addition, a platform should be able to tell to the application that it doesn't support sub-tasking as might be a pretty complex feature to implement depending on the context. In this case, the application should fall back on local execution of the sub tasks (for instance, FSL can do that in fsl_sub). This fallback mechanism should be mentioned in the paper too.
Implement in local executor (sub-tasking will not be supported, fallback mechanism will be used) and CBRAIN (based on existing sub-tasking mechanism).