bids-standard / python-validator

MIT License
2 stars 2 forks source link

Validation regression #12

Open marcelzwiers opened 2 weeks ago

marcelzwiers commented 2 weeks ago

If I use Pypi version 1.14.6 I get

bids_validator.BIDSValidator().is_bids('/sub-01/func/sub-01_echo-1_sbref.json')
Out[18]: False

If I use Pypi version 1.14.7 I get

bids_validator.BIDSValidator().is_bids('/sub-01/func/sub-01_echo-1_sbref.json')
Out[19]: True

I think v1.14.6 is the correct result, because indeed the task entity is missing

effigies commented 2 weeks ago

JSON files are sidecars, so may have missing entities. It might not be terribly useful, but I don't see anything wrong with that.

As long as /sub-01/func/sub-01_echo-1_sbref.nii.gz fails, that's what I would expect.

marcelzwiers commented 2 weeks ago
bids_validator.BIDSValidator().is_bids('/sub-01/func/sub-01_echo-1_sbref.nii.gz')
Out[3]: False
bids_validator.BIDSValidator().is_bids('/sub-01/func/sub-01_echo-1_sbref.json')
Out[4]: True

I always thought that json sidecars should have the same name as their associated imaging files. It seems weird to me that the imaging filename is then invalid, but the json is not?

marcelzwiers commented 2 weeks ago

I guess this is another side-effect of the inheritance principle :-(

effigies commented 2 weeks ago

I'd go so far as to call it the main effect of the inheritance principle. :-)

If the sidecar can be found via the inheritance principle, it is valid. Previously, this was enforced by hand-written regular expressions, which were not always exhaustive. Now the matching is procedural.

marcelzwiers commented 2 weeks ago

This causes quite a lot of difficulties for me, is there a way I can still do the old-style strict validation on the json files?

effigies commented 2 weeks ago

I suppose we could add a LegacyValidator class that uses the old regular expressions.

effigies commented 2 weeks ago

The old validator can be found here: https://github.com/bids-standard/python-validator/blob/e306460f861c312f21492a069af154f0e0c1279d/bids_validator/bids_validator.py

Feel free to open a PR to restore that as LegacyValidator. Please duplicate the tests.

effigies commented 2 weeks ago

This causes quite a lot of difficulties for me

Just to follow-up, what difficulties does this introduce? Presumably you've had to implement the inheritance principle already, if you're working with BIDS datasets?

marcelzwiers commented 2 weeks ago

Well, BIDScoin comes with a template bidsmap with all mappings from source data to BIDS. The mappings do not contain BIDS file extensions, as the actual data conversion is handled by plugins. From the mappings, BIDScoin itself can generate the output basenames, so I validated those thus far by appending a generic '.json' file extension. Now that doesn't work anymore, i.e. I would like to test the filepath without file extension, or be able to exclude errors from the extension

marcelzwiers commented 2 weeks ago

Ideally, there would be a switch isbids(.., inheritance = True/False)

effigies commented 2 weeks ago

Right now, we're constructing our regular expressions in bidsschematools:

https://github.com/bids-standard/bids-specification/blob/8cced666c3828f44b2306c3b7812e44d5398ab2b/tools/schemacode/bidsschematools/rules.py#L204-L237

TBH I wouldn't mind moving that here, since I don't think it really belongs directly in the schema. Once here, it would be pretty easy to create a variant that doesn't permit inheritance.

Another approach could be to compare mappings to rules directly:

import bidsschematools as bst
import bidsschematools.schema

schema = bst.schema.load_schema()

raw_rules = schema.rules.files.raw.values(level=2)

def match(mapping, rule):
    if mapping['datatype'] not in rule['datatypes'] or mapping['suffix'] not in rule['suffixes']:
        return False
    if any(entity not in rule['entities'] for entity in mapping['entity']):
        return False
    return all(entity in mapping['entity'] for entity, level in rule['entities'].items() if level == 'required')

def anymatch(mapping):
    return any(match(mapping, rule) for rule in raw_rules)

(Just some quick code to get the idea across. Haven't tried it out.)

marcelzwiers commented 1 week ago

Thanks, I looked into bidsschematools and I think it's very good but actually a bit too hard to grasp. Actually, I didn't know bidsschematools existed and for years I have been parsing the schema YAML-files myself, doing my own custom validations. For now I just do a work-around, and keep it as a TODO