Extend BIDS-Derivatives to handle longitudinal datasets? (the FreeSurfer use case)

alexandreroutier commented 4 years ago

Hello everyone,

Sorry in advance this question was already raised.

Given a participant with several sessions, I would like to store intra-subject results in BIDS Derivatives, in particular outputs from longitudinal FreeSurfer pipeline.

This pipeline is decomposed into 3 main steps:

Cross-sectional run of FreeSurfer (recon-all -all) i.e; the recon-all we all know
Unbiased template from a set of sessions (recon-all -base)
Longitudinal correction of segmentation of step 1 based on step 2 (recon-all -long)

Assuming a participant 01 with 3 sessions (e.g. M00, M18 and M36), I would like to store into BIDS Derivatives folder unbiased template for the 2 possible use cases:

Sessions M00 and M18 to create unbiased template (Template A)
Sessions M00, M18 and M36 to create unbiased template (Template B)

(e.g. when running a second time when M36 appears in BIDS dataset), then save longitudinal corrections of the 2 previous cases:

Longitudinal correction of segmentations for M00 and M18 sessions based on Template A
Longitudinal correction of segmentations for M00, M18 and M36 sessions based on Template B

Step 1 of FS has some BIDS Derivatives examples e.g. from fMRIPrep:

<output_dir>/
    freesurfer/
        sub-<subject_label>
            mri/
            surf/
            ...
        ...

Regarding steps 2 and 3 of FS, I only found mentions on the documentation of fMRIPrep and QSIprep with mri_robust_template command from FreeSurfer (also used in step 2):

The preprocessed T1w image defines the T1w space. In the case of multiple T1w images, this space may not be precisely aligned with any of the original images.

But this does not cover my use case I described above.

Proposition

I was wondering if we can introduce the notion of "set of sessions" with a new entity LongitudinalKey-<label> for this use case with name for LongitudinalKey to define.

What I have currently in mind is something with the following structure:

<output_dir>/
    <pipeline>/
        sub-01/
            long-M00M18/
               sub-01_long-M00M18_sessions.tsv
               ...
            long-M00M18M36/
               sub-01_long-M00M18M36_sessions.tsv
               ...
            ses-M00/
               ...
            ses-M18/
               ...
            ses-M18/
               ...
        ...

$ cat sub-01_long-M00M18_sessions.tsv

session_id
ses-M00
ses-M18

In this case, the value associated to a set of sessions is the concatenation of the session labels in lexical order. In general, I think it can be free al long as a *_sessions.tsv file is provided.

FreeSurfer example

<output_dir>/
   freesurfer/
       sub-01/
           long-M00M18/
              sub-01_long-M00M18_sessions.tsv
              sub-01_long-M00M18/
                 mri/
                 ...
           long-M00M18M36/
              sub-01_long-M00M18M36_sessions.tsv
              sub-01_long-M00M18M36
                  mri/
                  ...
           ses-M00/
              sub-01_ses-M00.long.sub-01_long-M00M18/
                 mri/
                 ...
              sub-01_ses-M00.long.sub-01_long-M00M18M36/
                 mri/
                 ...
              sub-01_ses-M00/
                 mri/
                 ...
           ...
       ...

Note: <segmentation>.long.<template> is a FreeSurfer convention when running step 3

To sum up:

I would like to add new entity to describe derived data based a set of sessions for a given participant.
While sessions results are stored under sub-<label>/ses-<label> folder, 'intra-subject' results (e.g. FreeSurfer unbiased template) would be stored under sub-<label>/LongitudinalKey-<label> folder where LongitudinalKey is to define (long-<label>, lng-<label>, ...)
This entity would help to track provenance of derived data. In particular, I will have some pipelines mixing surfaces from step 3 from FreeSurfer and PET data and this draft proposition helped me a lot.

Do you think that there is an alternative? Or that is worth generalising this use case?

Best, Alexandre

effigies commented 4 years ago

There are two things here, it seems:

1) FreeSurfer naming schemes. We can't really constrain how FreeSurfer names or organizes anything at all, but there are places where users or wrapping tools can make things that have their own structures more congruent with BIDS, and we can make some recommendations for that. Probably a good place to add it would be Non-compliant datasets.

2) How to encode data calculated for longitudinal epochs*. Are there other tools besides FreeSurfer where we can look at current practice? For that matter, it would be worth finding out how people are organizing their data on the input side? Done properly, I think there would be both a way to specify which sessions belong to which epochs at input, and tools could then sensibly produce output structures.

If a longitudinal epoch is subject-specific, then making a column in each sessions.tsv that assigns each session to an epoch makes sense. If a study typically has a common set of epochs, then an epochs.tsv or epochs.json at the root might make more sense. Or perhaps both.

* I'm not sure what the standard term is, if there is one, but I'm using "epoch" to mean a set of sessions considered close enough to be the same time point. If there's a better term, mentally substitute it throughout this post.

alexandreroutier commented 4 years ago

FreeSurfer naming schemes. We can't really constrain how FreeSurfer names or organizes anything at all, but there are places where users or wrapping tools can make things that have their own structures more congruent with BIDS, and we can make some recommendations for that. Probably a good place to add it would be Non-compliant datasets.

FreeSurfer was the only example I had in mind to illustrate my proposition. If someone knows tools/software that automate this use case, I would be interested in having some links to see how they face this situation.

How to encode data calculated for longitudinal epochs*. Are there other tools besides FreeSurfer where we can look at current practice? For that matter, it would be worth finding out how people are organizing their data on the input side? Done properly, I think there would be both a way to specify which sessions belong to which epochs at input, and tools could then sensibly produce output structures.

If a longitudinal epoch is subject-specific, then making a column in each sessions.tsv that assigns each session to an epoch makes sense. If a study typically has a common set of epochs, then an epochs.tsv or epochs.json at the root might make more sense. Or perhaps both.

I'm not sure what the standard term is, if there is one, but I'm using "epoch" to mean a set of sessions considered close enough to be the same time point. If there's a better term, mentally substitute it throughout this post.

By "set of sessions considered close enough", what duration do you have in mind? For ADNI dataset, you can have some participants followed for 8-10 years with several visits within this period.

When writing this issue, I forgot the notion of 'midway space' that can be considered e.g. mrregister in MRtrix.

ftadel commented 3 years ago

What's the current status for the structure of the derivatives/freesurfer folder?

We're trying to figure out whether in the bids-examples we should use: 1) derivatives/freesurfer/sub-ecog01/ses-preimp/ or 2) derivatives/freesurfer/sub-ecog01_ses-preimp/ ?

According to the FreeSurfer website, option #2 is recommended: https://surfer.nmr.mgh.harvard.edu/fswiki/BIDS But @sappelhoff mentioned this might be outdated.

Pending PR: https://github.com/bids-standard/bids-examples/pull/233

ftadel commented 3 years ago

@ahoopes As the author of this page on the FsWiki, could you help us here?

ahoopes commented 3 years ago

The second option is recommended on the wiki because it's in-line with the standard freesurfer naming style. Structural FS isn't really configured to operate on a nested folder structure, because it just assumes that all subjects of interest (really 'recons' of interest, including multiple time-points and generated templates) exist under SUBJECTS_DIR. That being said, I like the clean, organizational structure of the first option as well, and there's no reason why you couldn't just configure a different subjects dir for each subject folder.

However, there's a downside for any cross-subject analysis. For example, a cross-sectional comparison across time-point #1 for a set of subjects will require some sort of extra steps, like symlinking to sub-folders from a base SUBJECTS_DIR. You could maybe provide multiple dirs in the subject ID, like subj="sub-ecog01/ses-preimp", but that feels a bit hacky and I think it might break in some places.

I would suggest keeping in the style of derivatives/freesurfer/sub-ecog01_ses-preimp/, but I've been out of the loop on BIDS stuff for a bit now - is it possible to allow for both configurations?

ftadel commented 3 years ago

I would suggest keeping in the style of derivatives/freesurfer/sub-ecog01_ses-preimp/, but I've been out of the loop on BIDS stuff for a bit now - is it possible to allow for both configurations?

Both are valid structures. 1) sub-ecog01/ses-preimp is more natural for BIDS, but not mandatory. 2) sub-ecog01_ses-preimp is easier to process with FreeSurfer, as the BIDS /derivatives/freesurfer/ would be directly equivalent to the FreeSurfer SUBJECTS_DIR, avoiding any data duplication or complicated-to-maintain symbolic linking.

I vote for the second one, sub-ecog01_ses-preimp, because it requires less work on all sides.

ftadel commented 2 years ago

Could we declare that we should use the FreeSurfer wiki recommendation (https://surfer.nmr.mgh.harvard.edu/fswiki/BIDS) and consider this issue as resolved?

@effigies Is it possible to explicitly add a link to this FSwiki page in the BIDS specs? This could avoid this confusion of people trying to make the derivatives/freesurfer/ subfolder BIDS-compatible instead of using this non-BIDS (but widely used) convention?

Remi-Gau commented 1 year ago

I feel that this discussion has its place in the BEP35 google doc: https://docs.google.com/document/d/1tFRNumQyIgjXBNC3brFDLO9FaikjL84noxK6Om-Ctik/edit#heading=h.gjdgxs

@alexandreroutier @ftadel @ahoopes make sure to have a look if you have not already

bids-standard / bids-specification

Extend BIDS-Derivatives to handle longitudinal datasets? (the FreeSurfer use case) #461

Proposition

FreeSurfer example