Request for option to include ses-<label> entity in all files when a session is present

psadil commented 5 months ago

Summary

I am helping process a dataset that involves multiple subjects, some of whom where scanned in multiple sessions. The workflow involves creating a bids dataset for each distinct subject and session and then passing these to qsiprep. This works great for processing, but the outputs are difficult to aggregate. The difficulty is because qsiprep only uses the ses-<label> entity in files that are underneath the ses-<label>/ subfolder. For example, qsiprep produced these top-level anat files

$ tree sub-10042 | head
sub-10042
├── anat
│   ├── sub-10042_desc-aseg_dseg.nii.gz
│   ├── sub-10042_desc-brain_mask.nii.gz
│   ├── sub-10042_desc-preproc_T1w.nii.gz
│   ├── sub-10042_dseg.nii.gz

Aggregating the folders produced by qsiprep would cause one sessions' anat files to overwrite the other.

Contrast that with cases like fmriprep, which produces outputs that can be merged because files always include the session entity. For example

$ tree sub-10003 | head
sub-10003
├── figures
│   ├── sub-10003_ses-V1_desc-about_T1w.html 
│   ├── sub-10003_ses-V1_desc-conform_T1w.html 
│   ├── sub-10003_ses-V1_desc-reconall_T1w.svg 
│   ├── sub-10003_ses-V1_desc-summary_T1w.html

Additional details

I'm not sure if one approach is more BIDS compliant than the other. This part of the BIDS spec may be relevant: https://bids-specification.readthedocs.io/en/v1.9.0/common-principles.html#filenames

For a data file that was collected in a given session from a given subject, the filename MUST begin with the string sub-_ses-. Conversely, if the session level is omitted in the directory structure, the file name MUST begin with the string sub-, without ses-.

So, that could be taken to mean that if there is a session level directory present anywhere, then the ses-<label> entity should be included in all files.

Edit:

Also, a relevant passage from the Derivatives spec: https://bids-specification.readthedocs.io/en/v1.9.0/derivatives/introduction.html#file-naming-conventions

Each Derivatives filename MUST be of the form: [keyword-]. (where could either be an or a depending on the keyword; see Definitions)

Next steps

araikes commented 5 months ago

Given that the anat folder is a top level folder and is essentially a subject specific template space (rather than reflecting a specific session), how would you envision this?

Also, your tree examples contrast the anat folder with the figures folder where the figures are relevant to the reports and need session naming to be attached to the correct location in the report. The current top-level fmriprep anat folder does not contain any session indicators (for the same reason mentioned above).

psadil commented 5 months ago

The current layout makes sense to me, it's just inconvenient in the circumstance that I was describing (aggregating across sessions and wanting to keep information about the specific anatomicals that were used in each session)

FWIW, I see that both of these folders can pass bids (1.9.0) validation

A)

.
├── dataset_description.json
└── sub-1
   ├── anat
   │  └── sub-1_T1w.nii.gz
   └── ses-1
      └── anat
         └── sub-1_ses-1_T1w.nii.gz

B)

.
├── dataset_description.json
└── sub-1
   └── ses-1
      └── anat
         └── sub-1_ses-1_T1w.nii.gz

and this does not

C)

.
├── dataset_description.json
└── sub-1
   ├── anat
   │  └── sub-1_ses-1_T1w.nii.gz
   └── ses-1
      └── anat
         └── sub-1_ses-1_T1w.nii.gz

So, the request would be for an option that matches the organization of smriprep/fmriprep, which entails nixing the top-level anat folder altogether and placing those files inside the ses-<label> directory (with appropriate ses entities in the filenames). This happens even when there is only a single ses-<label>.

To be explicit, given organization (B), above, smriprep produces the following

.
├── dataset_description.json
├── logs/
├── sub-1
│  ├── figures/
│  └── ses-1
│     └── anat
│        ├── sub-1_ses-1_desc-brain_mask.json
│        ├── sub-1_ses-1_desc-brain_mask.nii.gz
│        ├── sub-1_ses-1_desc-preproc_T1w.json
│        ├── sub-1_ses-1_desc-preproc_T1w.nii.gz
│        ├── sub-1_ses-1_dseg.nii.gz
│        ├── sub-1_ses-1_label-CSF_probseg.nii.gz
│        ├── sub-1_ses-1_label-GM_probseg.nii.gz
│        └── sub-1_ses-1_label-WM_probseg.nii.gz
└── sub-1.html

psadil commented 5 months ago

Another option could be to have something like a --ses-label argument, which allows selecting which sessions to process and when used causes sessions within a dataset to be processed individually (e.g., if the bids dataset has multiple sessions, use of --ses-label would cause creation of a template for each session separately)

araikes commented 5 months ago

I went back and reread the context....

Is this a paradigm in which you expect the overall brain structure to change substantially from time point to time point to where it would make a difference whether individuals are session-registered or template registered (e.g. atrophy)?

psadil commented 5 months ago

It's a study looking at healing trajectories following surgery. One session is collected before surgery and the other after (closer to the endpoint). One of the main analyses involves predicting those trajectories from pre-surgical data alone, with the specific aim of assessing the informativeness of the pre-surgical data (that is, without post-surgical data). There will also be longitudinal analyses (for which the current layout with one anatomical template covering all session is great), but the session-specific analyses are needed.

araikes commented 5 months ago

In that case, being that you don't necessarily need combined reports either, then using --bids-filter-file to subset the dataset and creating two output directories (one for pre-treatment and one for post-treatment) would maintain the separability that you're looking for.

psadil commented 5 months ago

That's more-or-less how our workflow is proceeding. We are storing these session-level results as separate datasets (e.g., qsiprep_ses-1, qsiprep_ses-2), but our inputs were already being sent to qsiprep a session at a time, so there hasn't been a need for the --bids-filter-file argument. It's just less convenient for us to maintain two outputs folders.

If the feature request sounds outside the scope of qsiprep, feel free to close this issue.

mattcieslak commented 5 months ago

I've wanted to update how sessions are handled for a long time. Our group also processes sessions separately and store the results separate from one another, but there's no reason it has to be that way.

PennLINC / qsiprep