bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification
https://bids-specification.readthedocs.io/
Creative Commons Attribution 4.0 International
280 stars 164 forks source link

Overriding definition of BIDS-defined TSV columns with custom JSON entries #1453

Open sappelhoff opened 1 year ago

sappelhoff commented 1 year ago

In other news: What happens if a dataset curator uses a pre-defined column (like sample) in their dataset, but supply a definition for sample in their JSON file that completely diverges from the BIDS definition? I don't think we have a section for this case in the specification, but it feels like we should. And I believe this should be NOT RECOMMENDED, however if it's being done, we could state that the "user override" is be authoritative.

Originally posted by @sappelhoff in https://github.com/bids-standard/bids-specification/issues/1441#issuecomment-1475951326

effigies commented 5 months ago

This is relevant to recent discussions in bids-standard/bids-validator#1979, #1838.

I agree that BIDS-defined columns SHOULD NOT be overridden. The boundary here is a bit ambiguous, though. What if it's a well-defined, but optional column?

VisLab commented 5 months ago

More worrisome is overriding of onset and duration -- which are assumed to have units of seconds.

effigies commented 5 months ago

Yes, I would be inclined to say that overriding should have no effect, but I am waffling a bit on whether it should be an error to define them, given that you could make a definition consistent with the official meaning.

effigies commented 3 months ago

Just a note that the behavior of the schema validator is now to check whether the sidecar definitions describe a subset of acceptable values according to the schema, and units cannot be overridden if defined in the schema. If this compatibility criterion is not met, a warning is issued that the sidecar definition is ignored.

In either case, we only check values against the schema definition. We could perform both checks to ensure that the sidecar definition can also be used by a downstream tool, and we could promote this warning to an error if we think it is severe enough. One thing I've seen in practice is the warning raised when the schema uses SI units of "s" while the custom definition uses "second" or "seconds". It's a pretty minor failing, and I really don't want to get in the business of cross-referencing different spellings of units to determine whether something is a warning or error.

effigies commented 2 months ago

I'm going to punt on drafting text here, as I think we might want a little experimentation with validating overrides before we commit to language that locks us into something.

VisLab commented 2 months ago

I wanted to confirm that the HED column would be an optional column that would not be allowed to be overridden. Otherwise the HED validator and other tools would have no way of recognizing that this wasn't HED and would create fatal errors.

effigies commented 2 months ago

HED is schema-defined and the validator as written will ignore any attempts to override it. It will currently warn that it's ignoring the override, but it does make sense to promote to an error once we've settled on a policy.

My current thought is to directly add text to the rendered tables, e.g.,

Column name Requirement Level Data type Description
HED OPTIONAL string Hierarchical Event Descriptor (HED) Tag. See the HED Appendix for details.

This column may appear anywhere in the file.

This column MUST NOT be given a conflicting definition in the sidecar JSON.

While for conventional columns (where we just provide a recommended, but not universal, definition), we could write:

Column name Requirement Level Data type Description
age RECOMMENDED number Numeric value in years (float or integer value).
It is RECOMMENDED to tag participant ages that are 89 or higher as 89+, for privacy purposes.

This column may appear anywhere in the file.

This column MAY be given an alternative definition in the sidecar JSON.

The text in common principles could be pretty straightforward, explaining what it means to override or provide compatible definitions, without need to articulate a rule to determine which columns may or may not be overridden.