Closed NicolasGensollen closed 1 month ago
Unless I missed it the documentation would need changes too but otherwise LGTM ! Thanks @NicolasGensollen :)
Unless I missed it the documentation would need changes too but otherwise LGTM ! Thanks @NicolasGensollen :)
Absolutely ! I couldn't find the time to do it this afternoon, but will have a go at it tomorrow. Thanks for the review @AliceJoubert !
Closes #1101
After a little bit of thinking, I made some changes to the structure of the
dataset_description.json
file for CAPS datasets compared to what I originally described in #1101.Having the "Name" key used as a way to encode the different processing wasn't a good idea (i.e. something like "t1-linear + pet-linear"). Actually it was a very bad idea as it will probably lead to very complicated logic to understand which pipelines were run on a given CAPS.
The dynamic nature of CAPS datasets where additional processing pipelines can be run means that the
dataset_description.json
file is also going to change to incorporate the necessary metadata describing these processings.This PR proposes to have a field named "Processing" which is a list of objects describing the different processing pipelines that were run. Each processing has a name, a date, an author, a machine on which it was executed, and a path to an input dataset.
Here is an example of a
dataset_description.json
file obtained when runningt1-linear
andpet-linear
on a CI machine:The name can be any user-provided string, and defaults to a random identifier (as the one shown above). It cannot change, meaning that re-running a pipeline with the same CAPS dataset will not change the name of the dataset (even if the user explicitly asks for another name). It will only update the "processing" entry if needed.
A processing is identified by default by its name and its input path. This means that:
The two version fields "BIDSVersion" and "CAPSVersion" are delicate because they are supposed to version the metadata models used. Theoretically, when using an existing CAPS dataset as an output for a pipeline, the versions of BIDS and CAPS specifications of the file should match the ones used by Clinica, otherwise there is no guarantee that things will not break. For this reason, this PR proposes to raise an error when it happens. I'm still debating on this as it could easily be perceived as annoying (for example in the CI data we have tons of BIDS
dataset_description.json
with old versions that were never updated...), but it will force us to have meaningful metadata with our datasets.Finally, this PR also proposes to impose the presence of the
dataset_description.json
file in CAPS folders. When trying to run a pipeline with a new folder for CAPS, the file will automatically be generated. When trying to run a pipeline with an existing folder without adataset_description.json
, Clinica will raise an error with a suggestion of a minimum file that should be added.Link to the documentation page: https://aramislab.paris.inria.fr/clinica/docs/public/PR-1158/CAPS/Specifications/
Requires data PR: https://github.com/aramis-lab/clinica_data_ci/pull/68