bids-standard / bids-bep016

BEP016: diffusion derivatives
Creative Commons Attribution 4.0 International
6 stars 7 forks source link

Storage of non-conforming derivatives #33

Closed Lestropie closed 6 months ago

Lestropie commented 2 years ago

Don't know the extent to which this topic has been discussed elsewhere, so may need to either link to prior discussions or indeed post elsewhere if it warrants escalation.

Relates slightly to #32 in that the use of sub-directories is proposed, but decided it warranted its own Issue.

There will be many who consider the manipulation of diffusion model data from the standard format in which software packages have exported them for many years into BIDS Derivatives specification to be unnecessary. FSL's bedpostx is a good example. All they want is to be able to pass those data into some downstream analysis. And they would like to be able to do that within the framework of BIDS Apps, even if the data stored as intermediary are not explicitly converted into something that is stringently BIDS compliant. So this is a discussion of the extent to which such data should be "supported", in the context of BIDS derivatives specification and in the BIDS validator.

Take for example the FreeSurfer output that is generated by fmriprep:

BIDS_Derivatives/
    fmriprep/
        ...
    freesurfer/
        sub-<participant_label>/
            label/
                ....
            mri/
                ....
            scripts/
                ....
            stats/
                ....
            surf/
                ....
            tmp/
                ....
            touch/
                ....
            trash/
                ....

This I would expect to be considered a carnal violation by the validator. It's useful as far as getting that data from the software pipeline to the user, but it limits the extent to which the validator can be used on derivative data. And changing the entire suite of outputs of FreeSurfer into something that is compliant with some other BIDS Derivatives extension is likely some way off.

What if instead, we were to say that anything stored within a BIDS-compliant sub-directory name is to be treated as non-conformant but permissible.

So the FreeSurfer example above would instead look something like:

BIDS_Derivatives/
    fmriprep/
        ...
    freesurfer/
        sub-<participant_label>/
            anat/
                sub-<participant_label>_reconall/
                    label/
                        ....
                    mri/
                        ....
                    scripts/
                        ....
                    stats/
                        ....
                    surf/
                        ....
                    tmp/
                        ....
                    touch/
                        ....
                    trash/
                        ....

So a non-conforming bedpostx output might look something like (guessing based on their documentation, don't have an example output at hand at time of writing):

BIDS_Derivatives/
    bedpostx/
        sub-<participant_label>/
            dwi/
                sub-<participant_label>_bs/
                    merged_th1samples.nii
                    merged_th2samples.nii
                    merged_th3samples.nii
                    merged_ph1samples.nii
                    merged_ph2samples.nii
                    merged_ph3samples.nii
                    merged_f1samples.nii
                    merged_f2samples.nii
                    merged_f3samples.nii
                    mean_th1samples.nii
                    mean_th2samples.nii
                    mean_th3samples.nii
                    mean_ph1samples.nii
                    mean_ph2samples.nii
                    mean_ph3samples.nii
                    mean_f1samples.nii
                    mean_f2samples.nii
                    mean_f3samples.nii
                    mean_dsamples.nii
                    mean_d_stdsamples.nii
                    mean_S0samples
                    dyads1.nii
                    dyads2.nii
                    dyads3.nii
                    dyads1_dispersion.nii
                    dyads2_dispersion.nii
                    dyads3_dispersion.nii
                    nodif_brain_mask.nii

So as before, question is: to what extent should this be permitted in the specification, or indeed in the validator?

arokem commented 2 years ago

The spec does allow derivatives to be stored in any format. (see https://bids-specification.readthedocs.io/en/stable/02-common-principles.html#non-compliant-derivatives).

But: I think that in our work here we would want to define a BIDS-compliant way of storing these derivatives. For example, a fully compliant BIDS app that runs bedpostx should take these outputs and rename them into whatever we agree upon as the compliant names for these files.

Lestropie commented 2 years ago

The spec does allow derivatives to be stored in any format.

I seem to recall having gotten into a debate regarding that part of the spec at one point. One disadvantage I see there is that if you put all of your derivatives into a derivatives/ directory, then you are essentially disabling validation on all of your derivatives, across all pipelines. An advantage of the proposal here is that you would have the ability to mix conforming and non-conforming data, even within a single participant for a single pipeline.

But: I think that in our work here we would want to define a BIDS-compliant way of storing these derivatives.

Obviously. It'd be a very short project if that wasn't the case...

My point is that there is the prospect of defining a BIDS-compliant way of storing non-BIDS-compliant derivative data, which would potentially enable cross-utilisation of data across BIDS Apps in a shorter time frame than that required for the robust definition of any BIDS derivatives (not just DWI). We don't necessarily have to say "if your data don't conform to BIDS derivatives, which don't yet exist, then you can't use them in BIDS Apps".

Lestropie commented 2 years ago

One other point that came to mind here, which I think was discussed in the context of the tractography TRX format as well. Rather than / as an alternative to a sub-directory, one could instead use a tarball. So you could instead have e.g.:

BIDS_Derivatives/
    bedpostx/
        sub-<participant_label>/
            dwi/
                sub-<participant_label>[_desc-<label>]_bs.tar[.gz]
arokem commented 2 years ago

One disadvantage I see there is that if you put all of your derivatives into a derivatives/ directory, then you are essentially disabling validation on all of your derivatives, across all pipelines.

I don't think that validation is an all-or-none prospect, though. From the docstring of the pybids.BIDSLayout class:

        If [validate=]True, all files are checked for BIDS compliance when first indexed,
        and non-compliant files are ignored. This provides a convenient way to
        restrict file indexing to only those files defined in the "core" BIDS
        spec, as setting validate=True will lead files in supplementary folders
        like derivatives/, code/, etc. to be ignored

Which I read to mean that compliant files are validated and indexed, and non-compliant files are not. I think that this would allow for incremental changes to happen, while still allowing BIDS apps to work with intermediate non-compliant derivatives of various software. In other words, I think that your desideratum "you would have the ability to mix conforming and non-conforming data, even within a single participant for a single pipeline" is already met. But I am not 100% sure.

Lestropie commented 6 months ago

From prior discussions, this proposal is counter to the intent of the project. So I'm going to close outright. Hopefully with a bit more work, there will be the capability to convert reasonably complex derivatives (bedpostx is the yard stick here) to something very BIDS-ey.