bids-standard / bids-bep016

BEP016: diffusion derivatives
Creative Commons Attribution 4.0 International
6 stars 7 forks source link

WIP: Propose new mechanism to record metadata across separate data files, incorporate use of `dwimap` #90

Closed arokem closed 1 month ago

arokem commented 3 months ago

Attempts to follow the inheritance principle by reducing the number of metadata files in each directory to one per model, across all of the model parameters, using hierarchical organization within the side-car files.

Also: eliminates the distinction between model fit and model derived parameters, based on recent additions to the BIDS extensions principles in https://github.com/bids-standard/bids-extensions/pull/26.

EDIT: Edited to reflect that the change to metadata structure is probably the more substantial change proposed here.

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 87.93%. Comparing base (dd8fe32) to head (b336fa6). Report is 248 commits behind head on bep-016.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## bep-016 #90 +/- ## =========================================== + Coverage 87.88% 87.93% +0.05% =========================================== Files 14 16 +2 Lines 1279 1351 +72 =========================================== + Hits 1124 1188 +64 - Misses 155 163 +8 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

arokem commented 3 months ago

Some of the comments addressed in the recent string of commits (b8de4b0, 05e94e5, b336fa6), but not all the issues have been addressed yet. Changed the title to indicate that this is still a draft (not sure how to "gray out" through the GitHub interface, but others can feel free to do that).

Lestropie commented 2 months ago

A bit of clarification about metadata sub-dictionaries in 05e94e5e66f9634c5c407369902f34df9eb14958:

Through a particular lens / for certain contexts, having metadata for different model parameters all encoded within a single model metadata file can seem preferable. A particular example that comes to mind is response functions for a multi-tissue CSD fit. Having the response function for one individual tissue stored as metadata for just that one tissue is potentially misleading, as that one response function in isolation is not sufficient to explain how the ODF was generated. What defines the model inputs is the complete set of response functions used. In which case, there's an argument for defining those data all together in a single model metadata file, rather than allowing them to separate from one another across the individual ODF sidecars.

The converse situation is the one that in my opinion provides the strongest argument against this structure. Take an ODF, sticking with the MSMT CSD example. In order to robustly interpret an individual ODF data file, you must know:

The advantage of my prior structure using the (augmented) inheritance principle is that this information, specific to that one model parameter and crucial for its robust interpretation, would be what appeared exclusively in the sidecar JSON for that one parameter. With this proposal as it currently stands, that information is being deferred to a metadata file that applies to all parameters yielded by that model (or even the derivatives of such), therefore requiring some form of sub-directory referencing (which would probably need to be more robustly defined & systematized than what is in 05e94e5e66f9634c5c407369902f34df9eb14958) to infer which metadata fields are relevant vs. not relevant to that data file.


What confuses me most here is the difference in preference between what I have historically advocated for vs. what's being proposed here. Note that what I'm speaking about here specifically is not actually tied to the adoption of the dwimap suffix as reported by the PR title.

I established a long time ago that there was an intrinsic conflict between:

Given my attempts at making the requisite changes to the inheritance principle didn't succeed, an attempt at an alternative solution was made in bids-standard/bids-specification#1280, and that didn't pass the steering committee. This PR, while currently titled as targeting the dwimap suffix, is taking that problem and smuggling it into a completely new mechanism for definition of hierarchical metadata, to live alongside an already existing mechanism for hierarchical metadata definition. I don't know if people have a genuine preference for this structure, or whether there is a lack of understanding that they are manifesting a different---I would argue inferior---mechanism for addressing the same problem. I struggle to see how those people who don't like the idea of adding capability to the inheritance principle would like the idea of having metadata fields that apply to some applicable data files but not others. It's literally addressing the same thing, just doing it in a new way; and, given the description and responses, I'm concerned there isn't sufficient appreciation for such.

This is unfortunately quite personally frustrating as I've written extensively on the need for a decision to come from beyond myself on how to deal with this complexity, have either not received feedback or been explicitly told that it's not a problem, and now an alternative is being pushed in a PR that on the surface purports to be doing something else. I appreciate the desire to push this forward to a functional solution; I've truly been trying to clear my plate to get back to this but am so far behind on obligations I've not even been on my own software forum in 2 years. I nevertheless still have to argue that doing this via removing a (potentially ill-conceived) restriction on the inheritance principle is a better long term solution than some kind of completely novel one-to-many cross-referencing mechanism between metadata and data files that would directly compete with the inheritance principle.

At the very least, this PR needs to either:

arokem commented 2 months ago

Thanks for all the input! I changed the title to reflect and edited the comment at the top to emphasize what is probably the more substantial change proposed.

I am sorry about the frustration that you are feeling, and share some of the frustration that you feel as well. I think that part of the challenge here is that we are working on this separate repository, away from the core of the work on the bids-standard/bids-specification repo. To advance the discussion, I would like to suggest the following: how about if I rebase this branch on top of the main spec branch and make a pull request from this branch against bids-standard/bids-specification:main? This way, we could get input from a broader set of contributors, but I want to make sure that you don't feel like you are being sidelined and/or ignored, or that you feel like anyone is trying to make an end-run over your objections.

I should also mention that @effigies and @rwblair are both in Seattle at the moment, and we did discuss this change in person today. They did not seem too averse to it, but I think that it would be good if they could hear your objections and the alternatives that you propose more directly. We can point to your previous efforts (particularly in https://github.com/bids-standard/bids-bep016/issues/50), so they can see the pros/cons for these different proposed changes and we could discuss implications for other modalities/models, and how to move forward here.

arokem commented 2 months ago

Hi @Lestropie : just checking in to see whether you've had the opportunity to consider my comment from last week. How would you feel about moving this whole discussion to the main spec repo, where we could hopefully get more visibility on the issues. Even if we end up rejecting this proposal, based on your criticism, it might help raise the visibility of the issues, tie it to the broader issues that you are pointing out, and help folks understand better when we make suggestions changes with broader impacts (e.g., changes to inheritance, adding nested folders, etc.). I don't want to do this over your objections, though, because I don't want you to perceive this as an attempt to over-rule the objections you raised here, so I'd appreciate your thoughts just on this procedural issue. Maybe just a PR to introduce the idea of nested JSON meta-data for now (if only to give visibility to the objections you raised in https://github.com/bids-standard/bids-bep016/issues/50, which may have gone un-noticed by the broader community, because it's buried in this repo)?

Lestropie commented 2 months ago

It is probably the case that I need to re-write #50 on the main spec repo and attach a poll to it. I'd prefer not to attach any specific proposed modification to it as that may bias the result toward whatever is proposed there. I have a couple of urgent tasks right now but I will try to get to it.

arokem commented 2 months ago

That's a good suggestion If you're up for it that would be a good way forward, I think. Maybe we can focus on decision 1 first?

Lestropie commented 2 months ago

Pretty confident there's consensus that we don't want to be creating a unique suffix per parameter. There's potentially going to be problems around using one suffix for all parameters (as opposed to two, for which in retrospect I think I failed to make the strongest case), but as a first pass I'd rather present this issue on the presumption of use of _dwimap.*.

Lestropie commented 2 months ago

Having commenced writing something, I'm now reminded of https://github.com/bids-standard/bids-specification/issues/1339. The Steering Committee has explicitly rejected the prospect of a more complex Inheritance Principle on the basis that existing software may make erroneous conclusions about associations between data and metadata; presumably mitigating against such via version compatibility checks was considered too optimistic. But I would have thought then that your proposal of having, within a metadata file applicable to a given data file, metadata fields that are explicitly not applicable to that data file, would be rejected on the very same basis.

So I'm trying to describe the situation before listing candidates, but I'm not sure if there's even multiple candidates left to be offered predicated on BIDS 1.0 compatibility.

arokem commented 2 months ago

I do believe that's why we chose to go with what we have here, but I think that it could use some reconsidering. I think that your objections do make sense, and maybe we need to revisit https://github.com/bids-standard/bids-specification/pull/1280, as the introduction of dwimap is counter to the conclusion therein.

Lestropie commented 2 months ago

... maybe we need to revisit bids-standard/bids-specification#1280, as the introduction of dwimap is counter to the conclusion therein.

Not necessarily, due to their interaction.

Part of the reasoning behind the mfp / mdp distinction was that by utilising complex inheritance, it would enable definition of, in one place, the set of metadata fields that apply only to the immediate outcomes of model fitting and not subsequent derivatives thereof. If however inheritance is avoided entirely, and every data file has a sidecar that is a unique source of metadata, then one is free to define on a per-data-file basis what metadata to include vs. not include there. The distinction therefore loses most of its utility; furthermore, what utility might be left would arguably be more suitably deferred to an explicit provenance mechanism.

In the final linked comment you suggest resolving through the use of suffices. But as described elsewhere that comes with its own set of problems. I don't think that it is a necessary consequence that rejection of the model-*/ sub-directory means that multiple suffices must be used; only that some other solution for non-trivial metadata structure is necessary. While duplicating swathes of metadata across multiple sidecar files is definitely not my personal preference, it's technically workable; plus this happens already with raw data, where images even of different modalities from the same scanning session probably share quite a few metadata fields.

So there's maybe two different questions here:

  1. How do we unblock BEP016? I think the only answer is to avoid hierarchical metadata (whether via Inheritance Principle or embedded metadata dictionaries) and exclusively use sidecars with duplicate information.

  2. What might metadata handling look like in BIDS 2.0? I could make a proposal for what I'd like to see myself, but I think there are multiple moving parts that would all need to be put in place at once:

    • Complex inheritance
    • Edit: BIDS interfacing libraries providing one-liner functions that will return the relevant set of metadata fields for any data file (or set of data files)
    • No overloading of fields in inheritance
    • Validator ensures no overloading of metadata fields for any data file
    • Permit metadata files without suffices
    • BIDS App that automatically reduces & expands metadata within dataset according to Inheritance Principle

    Related, but not compulsory, would be allowing any entity or set of entities to be used for sub-directory construction (expansion of https://github.com/bids-standard/bids-2-devel/issues/55 that maybe I need to write separately).

Lestropie commented 2 months ago

... allowing any entity or set of entities to be used for sub-directory construction (expansion of https://github.com/bids-standard/bids-2-devel/issues/55 that maybe I need to write separately).

Correction, already exists: https://github.com/bids-standard/bids-2-devel/issues/54

Lestropie commented 1 month ago

Closed in favour of #92.