dandi / dandi-archive

DANDI API server and Web app
https://dandiarchive.org
13 stars 10 forks source link

Do re-validate dandisets upon dandischema upgrade #2024

Open yarikoptic opened 1 month ago

yarikoptic commented 1 month ago

Follow up to

I think we should revalidate all draft versions of the dandisets upon dandischema upgrade.

E.g. a use-case. Since PR https://github.com/dandi/dandi-schema/pull/235 released in version 0.10.2 (Wed Jul 10 2024) of dandi-schema we require contactPerson to have an email address, and that was also part of the schema-0.6.8 release. That is the version we currently use for dandi-archive:

❯ curl --silent -X 'GET' 'https://api.dandiarchive.org/api/info/' -H 'accept: application/json' | jq .schema_version
"0.6.8"

But we did not revalidate the dandisets. And now even for Draft version there is no warning that current Draft version is not "valid" even if there is no email for contact person, as it is currently the case in sample https://dandiarchive.org/dandiset/000003/draft :

image

And that page, lacking warnings, suggests that we can publish a new version (with invalid metadata).

If I trigger re-validation through edition of the metadata (did it on staging on some test dandiset we had around), it does then report an error and prevents publishing

image

So, I think it should be a job ran on start of dandi-archive which would somehow assess what version of dandi-schema was used in prior run, and if updated -- trigger revalidation of all draft versions of dandisets.

edit: here you can see the full list of dandisets, what state in DB we have recorded them and either we do or not have pydantic errors for them: https://github.com/dandi/dandisets-linkml-status/blob/main/dandi-reports/summary.md . E.g. image

waxlamp commented 1 month ago

Definitely a problem. We'll need to update the procedures for when the schema is updated and redeployed (as you suggest).