dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
19 stars 24 forks source link

validation of bids zarr did not catch name mismatch in pair of json and ome.zarr #1324

Open satra opened 10 months ago

satra commented 10 months ago

as can be seen in this image there is an isolated ome zarr file that had a bad name. the corresponding json file was ok.

image

while this is being fixed in the dandiset, it brought up the notion that i think the minimal bids validator may not be checking if there are matching pairs of json and ome.zarr files and probably could be added.

yarikoptic commented 10 months ago

Python bids-validator we use is indeed too rudimentary. Currently it only checks individual file paths for being kosher. I am not sure if "Python reimplementation of full fledged bids-validator" would ever materialize. May be smth like https://github.com/bids-standard/bids-validator/issues/1387 would be the solution. Somewhat more recent discussion is in https://github.com/bids-standard/bids-2-devel/issues/41#issuecomment-1668179615 . @effigies do you know of any other issue/long term plan/goal for seeing Python based bids-validator?

effigies commented 10 months ago

I have a vague goal of eventually fleshing out the Python validator, but it is not on a roadmap or even a backlog. To the extent that we want to do it, I think the highest-order benefits would be in validating the schema itself and providing a reference implementation of inheritance.

IMO having a second implementation of many of the rules will have limited utility, so I would really chip away at it based on concrete needs like this.

For checking datafile/sidecar pairings, I believe the JS approach is to construct all of the sidecar views according to the inheritance principle, marking each JSON file as "read" in the process. If at the end, there are any unread JSON files, a warning could be raised. For the reverse problem, running the checks in rules.sidecars would do the trick, as long as there is one required metadata field.

All that said, I would have expected the path above to be flagged by the current schema validator.