dandi / dandi-cli

DANDI command line client to facilitate common operations
https://dandi.readthedocs.io/
Apache License 2.0
21 stars 25 forks source link

Validate (.ome).zarr against ngff schema #1409

Open yarikoptic opened 6 months ago

yarikoptic commented 6 months ago

A part inspired from

For any .zarr we encounter we should

For any OME .zarr (be either detected through above or having .ome.zarr extension), validate that zarr against the specified version (if no version -- validation error)

❯ jq '.omero.version' .zattrs
"0.4"

of schema as provided on https://github.com/ome/ngff under {schema_version}/schemas/ folder in .schema json files and issue corresponding validation errors to the users trying to upload non-compliant OME .ngffs.

yarikoptic commented 6 months ago

following advice in https://github.com/ome/ngff/issues/228#issuecomment-1957119902 let's :

yarikoptic commented 6 months ago

here is stats across zarrs on S3 - first one for .ome.version, 2nd for .multiscales[].version:

dandi@drogon:~$ sort /tmp/ome-versions.out | uniq -c
     20 [0.2,0.2]
   4303 ["0.4","0.4"]
    572 [null,"0.4"]

where that file was created using for d in *-*-*; do git -C $d annex whereis .zattrs | awk '/versionId=/{print $2;}' | xargs curl --silent | jq -c '[.omero.version, .multiscales[].version]'; done | tee /tmp/ome-versions.out

note that some had it (incorrectly) as floats I think. Might be worth making code robust there and explicitly test for it being a string and otherwise issue validation error

jwodder commented 6 months ago

@yarikoptic

yarikoptic commented 6 months ago

well -- ideally cache locally. I thought we already do something similar to dandi schema or used to do for bids at some point.

"nearly-valid" I think most of zarrs in 000108 according to e.g. https://ome.github.io/ome-ngff-validator/?source=https://dandiarchive.s3.amazonaws.com/zarr/e41844a2-dad0-4b1c-9c53-d55883e0553f which errors with

{
  "instancePath": "/omero/channels/0/window",
  "schemaPath": "#/properties/omero/properties/channels/items/properties/window/required",
  "keyword": "required",
  "params": {
    "missingProperty": "start"
  },
  "message": "must have required property 'start'"
}

but then otherwise is happy to report 7 Datasets checked ✓.

I also found one in https://dandiarchive.org/dandiset/000243/draft/files?location=sub-S01%2Fanat&page=1 which fully valid https://ome.github.io/ome-ngff-validator/?source=https://dandiarchive.s3.amazonaws.com/zarr/7723d02f-1f71-4553-a7b0-47bda1ae8b42

also those in 000026 seems to be good: https://dandiarchive.org/dandiset/000026/draft/files?location=sub-I45%2Fses-SPIM%2Fmicr&page=1