Diffusion derivatives suffix (`dwi`, `model`, `mdp`, `mfp`, etc.)

oesteban commented 1 year ago

Partly replaces #55 - I believe it would be beneficial to first settle the suffix/suffices and then think about entities. This builds on the discussions had during the recent BIDS Workshop.

Undecided about this issue, I went back to the definition of the suffix in the common principles of BIDS:

suffix - an alphanumeric value, located after the key-value_ pairs (thus after the final _), right before the File extension, for example, it is eeg in sub-05_task-matchingpennies_eeg.vhdr.

Which is clearly open-ended. However, a little before suffix we find Modality:

Modality - the category of brain data recorded by a file. For MRI data, different pulse sequences are considered distinct modalities, such as T1w, bold or dwi. For passive recording techniques, such as EEG, MEG or iEEG, the technique is sufficiently uniform to define the modalities eeg, meg and ieeg. When applicable, the modality is indicated in the suffix. The modality may overlap with, but should not be confused with the data type.

I acknowledge that When applicable, the modality is indicated in the suffix. can be interpreted in many ways. Still, in my opinion, _model and similar suffices should be avoided as they don't say anything about the modality of the stored data. More arguably, I believe we cannot say these are dwi data anymore (although, considering the introduction of the BIDS-Derivatives spec dwi actually seems applicable in this case).

I think mdp and mfp are in the right direction. They talk about the modality (loosely, a T1w would also be a parameter), but every neuroimager would understand that a parameter map is an N-D outcome of fitting some model (or derivation thereof). Another big plus of this line is the potential to be adopted by other imaging and non-imaging modalities.

However, I don't see the differentiation between mdp and mfp -- after all, (i) whether they are a result of the fit or some derivation from the result of the fit does not change their nature of being parameters; and (ii) I think the distinction is a matter of provenance, for which there are some vague prescriptions already "passed" within BIDS-Derivatives. I will open a new issue to describe my point about (ii), which can help with the problem of double inheritance.

In addition to the above, mdp and mfp, have model in them, limiting the interpretation of these parameters (if they are understood as a modality in the context of derivatives).

As a result, I would like to compel everyone to go towards some _param, _params, _parammap, _pmap (the map bit might be unacceptably restrictive), or similar. If parameters cannot be understood as a modality, then _dwi seems like the only principled option to me (and for BEP016 it doesn't create more problems than it solves, as opposed to all other options). In that case, we would push generalization for later, and multi-modal analyses will need to figure out new suffices (likely following the same line of thought)

Opinions?

cc/ @arokem @Lestropie @francopestilli @poldrack @effigies

effigies commented 1 year ago

The distinction might be worth preserving, even if we need to find a different line between them than "fit"/"derived". A denoised BOLD series is nothing but the residuals from a model fit, and could be considered a derived parameter. It still makes more sense from a BIDS perspective to call it BOLD.

One we could make is "does this parameter have a physical interpretation"? For example, T1map is a fit parameter map, but the interpretation is the $T_1$ time constant at the voxel. This is in pretty stark contrast to, say, a GLM beta, which can only be understood in the context of a design matrix.

So I don't have the full context of what's being considered fit/derived (I'm sure it's out there; there's a lot to read), but my impression is that you have clear model parameters called things like "kappa" or "nu" in mfp and things like fractional anisotropy and mean diffusivity in mdp. If that's the case, I'd say that perhaps mfp isn't too narrow but mdp might be too broad, and it could be worth splitting things that have physical interpretations into their own suffixes.

Splitting/lumping is a constant tension with suffixes, but I would really resist the urge to lump to the point where I can't distinguish "this is only interpretable with a model in front of me" and "here's a measure that I can use without looking at the model (however instructive it might be)".

As a result, I would like to compel everyone to go towards some _param, _params, _parammap, _pmap (the map bit might be unacceptably restrictive), or similar.

Don't have a strong opinion on map or not. It will probably be implied by the extension, but it might be less clear without. Not sure what they call things in EEG.

For statistical maps, I recall a discussion at computational models about using _hist and _dist for non-parametric histograms and parameterized distributions, respectively. That may be wandering a bit too far down the statistical abstraction rabbit hole, though.

oesteban commented 1 year ago

Thanks a lot for the feedback, just one addition:

it could be worth splitting things that have physical interpretations into their own suffixes.

I instinctively went this route in the BIDS meeting but felt I found great opposition against _FA and such. A caveat of this option is that every new model requires almost certainly additions to the current suffices available.

oesteban commented 1 year ago

Regarding splitting/lumping #69 could be a good tradeoff, IMHO

effigies commented 1 year ago

Would morph (for morphometry) work for these model derived parameters with physical interpretations? This might be a point of convergence with structural derivatives.

oesteban commented 1 year ago

I believe _micro (for microstructure) could be a possibility for models more sophisticated than DTI. Not sure though if preclinical imaging also talks about microstructure, though.

I'd be surprised if morph worked out, but happy if so happens.

Lestropie commented 1 year ago

I remember coming across some of these definitions when doing https://github.com/bids-standard/bids-specification/pull/947 and finding a lot of them lacking. For some, I suspect that it was primarily the filesystem structure that was determined, and terms were then added to the different elements of such afterwards, and therefore there's not always a strong definitional distinction between things.

Personally I would look to re-write the Modality definition, I don't find it useful at all. What we're ultimately looking at here is something like "what is the primary mechanism that determines the intensity contrast between different imaged elements?". That I think holds for all examples there (T1w, bold, dwi, eeg, meg, ieeg), but is more specific than "the category of brain data". "When applicable the modality is indicated in the suffix" really needs a contra-indicated case; ie. when are the modality and suffix not the same? "The modality may overlap with, but should not be confused with the data type" is also problematic. Personally I dislike having dwi as both datatype and suffix, and it potentially causes issues for other prospecitve augmentations I'm thinking about. But again: when are they or are they not the same? From what I've seen, I think it's something like: "if there is not more than one modality within a datatype, the suffix may be the same as the datatype".

I think the distinction is a matter of provenance, for which there are some vague prescriptions already "passed" within BIDS-Derivatives.

I've been thinking about this a bit as well. Unfortunately I'm not familiar with where provenance is up to. But I did make a relevant note in the directory inheritance document. A point made that opened up that discussion was that if there were different suffices within the model directory, then (under the current proposal) the model-wise JSON in the parent directory would only apply to a subset of those files. But in a way this could be construed as an advantage: the model-wise JSON would apply to what's currently called Model Fit Parameters, not to what's currently called Model-Derived Parameters, and so the relevance of that metadata to model derivatives would be deferred to provenance.

Not making that as a case for retaining either that structure or those suffices, just pointing out the relationship.

As a result, I would like to compel everyone to go towards ...

I'd even thought about _spm previously; but in addition to being conflated with a specific piece of software, not all such parametric maps result from some statistical process.

Another aspect that can jam things up here a little is that for scalar measures like eg. FA or MD, thinking of "parametric maps" makes sense, but for something like encoding of estimated fibre orientations, such terminology isn't as intuitive.

If parameters cannot be understood as a modality, then _dwi seems like the only principled option to me

This would break my suggestion of "modality" being "what is the primary mechanism that determines the intensity contrast between different imaged elements?". Yes it's not the documented definition, but this is how I've conceptualised it internally for some time, and I think this is why the suggestion of using dwi as the suffix for diffusion models irked me from the outset.

The distinction might be worth preserving, even if we need to find a different line between them than "fit"/"derived". A denoised BOLD series is nothing but the residuals from a model fit, and could be considered a derived parameter. It still makes more sense from a BIDS perspective to call it BOLD.

I think my proposed modality definition above gives a good explanation for this.

So I don't have the full context of what's being considered fit/derived (I'm sure it's out there; there's a lot to read)

Easiest example I've been using is the diffusion tensor model. Conforming the tensor model to the empirical image data results in 6 parameters of a symmetric 3x3 matrix. We can then calculate the FA based on the eigenvalues of that tensor. There's a sequence of dependencies: we can't get an FA map without first fitting the tensor model, and we can generate the FA map looking only at the model fit with no reference back to the empirical image data.

It could be worth splitting things that have physical interpretations into their own suffixes.

This comes back to Decision 2 in https://github.com/bids-standard/bids-bep016/issues/50. Results in a massive explosion in suffices, and an inability to store / validate anything not explicitly added to the specification; I'm trying to go the other way with BEP016. Was also stated by @sappelhoff: "I think one of the principles in BIDS so far was to use as few suffixes as possible, as many as needed"

Would morph (for morphometry) work for these model derived parameters with physical interpretations?

No, I would only be applying such a term to derivative based on macroscopic (ie. greater than the image resolution) physical deformations.

I believe _micro (for microstructure) could be a possibility for models more sophisticated than DTI.

First impression is that this is quite a good prospect. As per my internal definition of "modality" above, this concisely encodes what makes the content of these data different from all other BIDS data. It also means we wouldn't have the hubris to attempt to dictate across the entire BIDS ecosystem what anything for which the word "model" is applicable should look like. It removes entirely the distinction between fit and derived, but deferring that to provenance may well be preferable. Curious to know the extent to which this may apply to non-diffusion MRI.

Will continue to contemplate, and curious to hear from others, but this currently appeals.

arokem commented 1 year ago

I don't love _micro, but not enough that I would have to hold my nose every time I worked with a file that had this suffix. Just to understand: it sounds like this is not intended to cover phenomenological modeling of DWI (e.g., DTI). What do we do in that case? Another suffix (_pheno?)?? In my opinion, using the _dwi suffix would be as informative as the _micro suffix, but I could well be missing some helpful distinction here.

oesteban commented 1 year ago

_micro was my rushed attempt to project @effigies' suggestion of _morph into something more appropriate for dMRI, without much thought. I'd be okay with just propagating _dwi, instead of adding more suffices like _pheno just because _micro fails to cover the range of data represented.

If _dwi is not seen as specific enough, then _param or _mparam (for model parameters) are two I don't particularly dislike.

Lestropie commented 1 year ago

... it sounds like this is not intended to cover phenomenological modeling of DWI (e.g., DTI).

In my own mind I didn't foresee any need to distinguish between DTI and "microstructural models" at the level of filename suffices. It's just "storage of data that is sensitive to microstructure". It seems unusual to me that people would not like distinguishing between model fit parameters and model-derived parameters at the level of filenames, but then want to resolve between different DWI-derived microstructural models depending on the biological specificity of the relevant parameters; to me the latter is less functional and more esoteric.

I would also note that a natural consequence of this, since not all people like my purely theoretical approach (:nerd_face:), is that tractography data in (tentatively) BEP036 would no longer adopt the relatively uninformative and unintuitive "_model" suffix, would would instead require something of a comparable scope to "_micro", eg. "_tractography".

Lestropie commented 1 month ago

Asa per https://github.com/bids-standard/bids-specification/issues/1602, "_dwimap.<ext>" has been adopted in #92.

bids-standard / bids-bep016

Diffusion derivatives suffix (`dwi`, `model`, `mdp`, `mfp`, etc.) #68