Open sappelhoff opened 1 year ago
This is relevant to recent discussions in bids-standard/bids-validator#1979, #1838.
I agree that BIDS-defined columns SHOULD NOT be overridden. The boundary here is a bit ambiguous, though. What if it's a well-defined, but optional column?
More worrisome is overriding of onset
and duration
-- which are assumed to have units of seconds.
Yes, I would be inclined to say that overriding should have no effect, but I am waffling a bit on whether it should be an error to define them, given that you could make a definition consistent with the official meaning.
Just a note that the behavior of the schema validator is now to check whether the sidecar definitions describe a subset of acceptable values according to the schema, and units cannot be overridden if defined in the schema. If this compatibility criterion is not met, a warning is issued that the sidecar definition is ignored.
In either case, we only check values against the schema definition. We could perform both checks to ensure that the sidecar definition can also be used by a downstream tool, and we could promote this warning to an error if we think it is severe enough. One thing I've seen in practice is the warning raised when the schema uses SI units of "s" while the custom definition uses "second" or "seconds". It's a pretty minor failing, and I really don't want to get in the business of cross-referencing different spellings of units to determine whether something is a warning or error.
I'm going to punt on drafting text here, as I think we might want a little experimentation with validating overrides before we commit to language that locks us into something.
I wanted to confirm that the HED
column would be an optional column that would not be allowed to be overridden. Otherwise the HED validator and other tools would have no way of recognizing that this wasn't HED and would create fatal errors.
HED is schema-defined and the validator as written will ignore any attempts to override it. It will currently warn that it's ignoring the override, but it does make sense to promote to an error once we've settled on a policy.
My current thought is to directly add text to the rendered tables, e.g.,
Column name | Requirement Level | Data type | Description |
---|---|---|---|
HED | OPTIONAL | string | Hierarchical Event Descriptor (HED) Tag. See the HED Appendix for details. This column may appear anywhere in the file. This column MUST NOT be given a conflicting definition in the sidecar JSON. |
While for conventional columns (where we just provide a recommended, but not universal, definition), we could write:
Column name | Requirement Level | Data type | Description |
---|---|---|---|
age | RECOMMENDED | number | Numeric value in years (float or integer value). It is RECOMMENDED to tag participant ages that are 89 or higher as 89+, for privacy purposes. This column may appear anywhere in the file. This column MAY be given an alternative definition in the sidecar JSON. |
The text in common principles could be pretty straightforward, explaining what it means to override or provide compatible definitions, without need to articulate a rule to determine which columns may or may not be overridden.
In other news: What happens if a dataset curator uses a pre-defined column (like
sample
) in their dataset, but supply a definition forsample
in their JSON file that completely diverges from the BIDS definition? I don't think we have a section for this case in the specification, but it feels like we should. And I believe this should be NOT RECOMMENDED, however if it's being done, we could state that the "user override" is be authoritative.Originally posted by @sappelhoff in https://github.com/bids-standard/bids-specification/issues/1441#issuecomment-1475951326