bids-standard / bids-bep016

BEP016: diffusion derivatives
Creative Commons Attribution 4.0 International
6 stars 7 forks source link

Bootstrapping #38

Open Lestropie opened 2 years ago

Lestropie commented 2 years ago

Handling representation of model bootstrapping (e.g. bedpostx) requires a lot more consideration than what is currently in the specification. I think that we should consider stripping out what's there, implement support for the mean outputs of bedpostx, and then once that's achieved we should then try re-inserting bootstrapping support as an explicit PR.

Lestropie commented 7 months ago

Deferring comment thread in #90 here.

Response to @arokem's comment is kind of two separate parts:

  1. My main concern is "shoe-horning" the distinction between bootstrap realisations vs. aggregate / non-bootstrapped fit into a sub-optimal location. I'm not a fan of distinguishing between these within "_param-", as the quantitative parameter being encoded is identical between the two files that require disambiguation. I can't say for certainty yet where I think that distinction should happen, but I think there's multiple candidates that would be preferable to that one.

  2. Your point about the mechanism of aggregation leans into the complexity of #61 introducing a "_stat-" entity. On the surface, this seems an elegant way to faithfully encode the fact that some parameter is being aggregated across some data dimension. Notably, this would not only be applicable to bootstrapping; eg. often the mean intensity of an fMRI time series is generated, which is most faithfully described as computing the mean statistic along axis 3. This is however:

    1. Another very general BIDS principle that would need to be made robust to a more general context. You would need to be able to encode---whether in filename entities, metadata, or both---not only the statistic computed, but also the dimension along which that statistic was computed, and indeed how it was computed; for instance, for mean polar angles, presumably FSL doesn't compute the mean of the polar angles; it should be computing the mean orientation on S2 across bootstrap realisations and then exporting the polar angle representation of such.
    2. Starting to look a bit like provenance rather than a description of the data.

    So my progress kind of stalled here.

  3. This statement opens up more complexity than first realised:

    (median bs? mean bs? A run on the intact sample?).

    Imagine two different model fits. In the first, it performs bootstrapping as per bedpostx, and then computes the mean fibre orientations across the realisations. In the second, there is no bootstrapping whatsoever; it just does a max a posteriori fit to the empirical data, yielding one set of fibre orientations. In terms of data content, these two are identical, however the ways in which they differ from the bootstrap realisations differs: the first is a derivative of the model fit, whereas the second is a different fitting procedure. In my prior structure, I'd have described this as the first being a model-derived parameter, based on a mean statistic computation across the model fit parameters of the bootstrapped model, and the second as being a distinct model fit from the first, with the difference in the two model fits being the use of bootstrapping, and this would most likely be encoded using different values for the "_model-" entity. I'm not sure how to disambiguate these given the structure of #90.