Open dominikwelke opened 2 weeks ago
n/a
was permitted for TSV files, as there needs to be a way of marking missing data in a table. In a dictionary, absence is a better marker than special values.
For some BEPs, there were conflicts between metadata that were considered enough of a best practice that it should be REQUIRED and known cases where the metadata were absent, so requiring its presence but permitting "n/a"
as an escape hatch was determined to be the best way to thread that needle.
For RECOMMENDED and OPTIONAL fields, allowing that escape hatch only increases the likelihood of a tool dumping "Key": "n/a"
with every permissible Key
for that sidecar. Users then immediately get a dataset with no warnings, and it's not until they try to run a tool that needs a piece of metadata that anything breaks. Many tools assume that the validator has validated the data available, and so I can safely say metadata[field]
and assume that I either get valid data or nothing (a KeyError
is raised in Python, undefined
is returned in Javascript...). Overall, this assumption is good because it reduces duplication of effort and the opportunity for tools to have mutually incompatible validation mechanisms.
- BIDS validator doesn't seem to act consistently (as it seems to miss not explicitly allowed
n/a
s if the required datatype isstring
)
"n/a"
is a valid string. If a field does not have a pattern or an enumerated set of permitted values, for all the validator can know, it's permissible. We could use a pattern (^(?!n/a$)
) that specifically excludes that one value, for fields where we know it's inadmissible, but I don't think we can do it universally. There may be some fields where "n/a"
would have been carved out as permissible if it were not already a valid string.
- the BIDS standards might be enhanced, imo, by (automatically or explicitly) allowing
n/a
for allRECOMMENDED
fields other than perhaps with theREQUIRED
fields, allowingn/a
entries shouldn't risk integrity or interpretability of the BIDS dataset. it might in the contrary increase consistency of the dataset: if i'm not mistaken, the current rules imply that i would have to removen/a
fields in a subject, even if it is just missing data and all other subjects have this information. allowingn/a
enables users to have an identical set of data entries across all subjects (including the explicit information if something is missing).
I don't understand why an identical set of data entries is particularly desirable. Could you elaborate on that? If I'm writing a tool to accept BIDS data and collate entries into, e.g., a dataframe, I need to be equally prepared to deal with missing fields as ones with special missing-value indicators. Otherwise I'm writing to a stricter specification than BIDS, and my tool will not work on all BIDS datasets.
thanks for the explanations @effigies - I thought this should have come up at some point but couldnt find anything.
I don't understand why an identical set of data entries is particularly desirable. Could you elaborate on that?
you are right from a tool perspective. but BIDS datasets are human readable, and the special entry n/a
has more information than a missing value (i.e. this info is supposed to be here, but is missing). it is not a strong argument, though, and compromised or even invalidated by procedures like default adding of n/a
s, as you described above
might also be relevant with people who format their datasets themselves and write their own tools, as then the current standard and descriptions/explanations can be confusing or seem inconsistent (why is n/a
allowed in one case but not another) or, like me, simply missing the mismatch for years :) . again, probably no argument for changing the definitions. it can lead to people believing they have BIDS compliant data, when they haven't, though, and the (older?) versions of bids validator that only checked filenames wont catch it.
We could use a pattern (^(?!n/a$)) that specifically excludes that one value
was my immediate idea, too, in case the standard remains unchanged.
but i'm not in the validator workings.. can you only check regex rules, or would it be an option to include flags, e.g. via dictionaries with (in this case) n/a
allowed or not allowed?
Metadata fields and their values are defined here in schema.objects.metadata (with formats in schema.objects.formats) and their requirement levels in schema.rules.sidecars. The validator constructs JSON schema from these inputs and applies it to the collated sidecar data.
Anything much more complicated than that is going to become brittle quickly. Validating "n/a"
conditionally would be a serious pain because it is sometimes valid, so should not be removed.
One thing I wonder about is allowing null
to be equivalent to missing. This would allow a tool to dump all permissible fields into a sidecar without affecting its validation status. The validator could filter null
-valued fields before applying JSON schema validation, and all warnings and errors would be produced as normal. Tools would need to do the same, or be written in such a way as to treat null
as equivalent to absent (e.g., metadata.get(field)
in Python) to continue working as they do.
This may be unworkable, since the inheritance principle says that there is no such thing as unsetting, and null
hasn't been permitted, so there's a very strong possibility of divergent treatment of datasets across tools. Would probably need to be a BIDS 2.0 thing.
ok, so I take away that:
fine with me. should i close the issue, or do you want to keep it open @effigies ?
my only workable suggestion would then be that the docs might be a bit clearer about the inconsistent treatment of n/a
fillers (allowed but discouraged in tsv
s, not allowed in json
s - this last bit is not explicitly mentioned, i think)
e.g. here (no need to mention that the validator wont catch all cases of unallowed n/a
s in json
s)
Your idea
Hi all,
related to #876
i checked the issue tracker, but perhaps i missed something or it has been discussed before.. in this case, apologies!
I encountered a problem (in my perspective) with
RECOMMENDED
sidecar entries. BIDS validator complains aboutn/a
entries, IFn/a
s are not explicitly allowed AND the required datatype is something else thanstring
(e.g. numerical)in my case, i encountered it with head circumference measures in EEG recordings
there are 2 separate points in this:
n/a
s if the required datatype isstring
)n/a
for allRECOMMENDED
fields other than perhaps with theREQUIRED
fields, allowingn/a
entries shouldn't risk integrity or interpretability of the BIDS dataset. it might in the contrary increase consistency of the dataset: if i'm not mistaken, the current rules imply that i would have to removen/a
fields in a subject, even if it is just missing data and all other subjects have this information. allowingn/a
enables users to have an identical set of data entries across all subjects (including the explicit information if something is missing).what are your thoughts on that?