dandi / dandi-archive

DANDI API server and Web app
https://dandiarchive.org
14 stars 10 forks source link

Provided metadata has no schema version #1193

Closed danlamanna closed 2 years ago

danlamanna commented 2 years ago

for now I think it is ok to assume that ValueError raised while calling a validation function is due to validation failure. I think it is needed to investigate how ValueError: Provided metadata has no schema version could come about (check the logs/traceback there to see how it got there) since AFAIK dandi-cli shouldn't provide such records, so might be somewhere on web frontend.

Originally posted by @yarikoptic in https://github.com/dandi/dandi-archive/issues/1117#issuecomment-1183554239

satra commented 2 years ago

the web front end does not touch the metadata of the assets. so this would have happened in the API + some other client interaction.

yarikoptic commented 2 years ago

here is a snippet for recent (today) case

2022-07-14T05:05:51.186968+00:00 app[worker.1]: [2022-07-14 05:05:51,186: INFO/Beat] Scheduler: Sending due task dandiapi.api.scheduled_tasks.validate_draft_version_metadata() (dandiapi.api.scheduled_tasks.validate_draft_version_metadata)
2022-07-14T05:05:51.189241+00:00 app[worker.1]: [2022-07-14 05:05:51,189: INFO/MainProcess] Task dandiapi.api.scheduled_tasks.validate_draft_version_metadata[5f96280e-5c1e-4a8d-bc2d-e75fd7d27461] received
2022-07-14T05:05:51.192666+00:00 app[worker.1]: [2022-07-14 05:05:51,192: INFO/ForkPoolWorker-3] dandiapi.api.scheduled_tasks.validate_draft_version_metadata[5f96280e-5c1e-4a8d-bc2d-e75fd7d27461]: Found 1 versions to validate
2022-07-14T05:05:51.196121+00:00 app[worker.1]: [2022-07-14 05:05:51,196: INFO/MainProcess] Task dandiapi.api.tasks.validate_version_metadata[16f17341-2bac-4db9-81e4-8d704c3ec064] received
2022-07-14T05:05:51.197145+00:00 app[worker.1]: [2022-07-14 05:05:51,197: INFO/ForkPoolWorker-3] Task dandiapi.api.scheduled_tasks.validate_draft_version_metadata[5f96280e-5c1e-4a8d-bc2d-e75fd7d27461] succeeded in 0.007015325129032135s: None
2022-07-14T05:05:51.213615+00:00 app[worker.1]: [2022-07-14 05:05:51,213: INFO/ForkPoolWorker-4] dandiapi.api.tasks.validate_version_metadata[16f17341-2bac-4db9-81e4-8d704c3ec064]: Validating dandiset metadata for version 500
2022-07-14T05:05:51.431635+00:00 app[worker.1]: [2022-07-14 05:05:51,431: INFO/ForkPoolWorker-4] Error calculating assetsSummary
2022-07-14T05:05:51.431637+00:00 app[worker.1]: Traceback (most recent call last):
2022-07-14T05:05:51.431638+00:00 app[worker.1]: File "/app/dandiapi/api/models/version.py", line 229, in _populate_metadata
2022-07-14T05:05:51.431638+00:00 app[worker.1]: summary = aggregate_assets_summary(
2022-07-14T05:05:51.431639+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/dandischema/metadata.py", line 334, in aggregate_assets_summary
2022-07-14T05:05:51.431639+00:00 app[worker.1]: _add_asset_to_stats(meta, stats)
2022-07-14T05:05:51.431639+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/dandischema/metadata.py", line 266, in _add_asset_to_stats
2022-07-14T05:05:51.431640+00:00 app[worker.1]: raise ValueError("Provided metadata has no schema version")
2022-07-14T05:05:51.431640+00:00 app[worker.1]: ValueError: Provided metadata has no schema version

so we need to check up what is the version 500 is (dandiset) and see which asset(s) are bad. Might give a clue

dandi-archive code might improve logging to inform not just version id (500) but actual dandiset id , I guess it is only for draft. May be it is indeed client uploading incorrect one? then may be server must do a basic check for metadata to contain schema version before even considering a new metadata record from client?

yarikoptic commented 2 years ago

looking at django admin interface -- 500 is 000108

yarikoptic commented 2 years ago

still running dandi ls but here is a sample of some assets without schemaVersion -- created in march -- so likely it is some of those @satra uploaded "manually" not through dandi-cli:

$> grep -v schemaVersion 000108-ls.json
[
  {"asset_id": "e4ca7d68-1f3b-475e-bd78-e32dbd8c2978", "created": "2022-03-08T23:20:29.383657+00:00", "metadata": {"contentSize": 29541947716, "contentUrl": ["https://api.dandiarchive.org/api/assets/e4ca7d68-1f3b-475e-bd78-e32dbd8c2978/download/", "https://dandiarchive.s3.amazonaws.com/zarr/bf47be1a-4fed-4105-bcb4-c52534a45b82/"], "digest": {"dandi:dandi-zarr-checksum": "047d0479303830d326cc1c5080ca13db-97192--29541947716"}, "encodingFormat": "application/x-zarr", "id": "dandiasset:e4ca7d68-1f3b-475e-bd78-e32dbd8c2978", "identifier": "e4ca7d68-1f3b-475e-bd78-e32dbd8c2978", "path": "sub-MITU01/ses-20210720h20m19s32/micr/sub-MITU01_ses-20210720h20m19s32_sample-127_stain-YO_run-1_chunk-8_SPIM.ome.zarr"}, "modified": "2022-07-13T21:44:25.190403+00:00", "path": "sub-MITU01/ses-20210720h20m19s32/micr/sub-MITU01_ses-20210720h20m19s32_sample-127_stain-YO_run-1_chunk-8_SPIM.ome.zarr", "size": 29541947716, "zarr": "bf47be1a-4fed-4105-bcb4-c52534a45b82"},
  {"asset_id": "08be3c08-b57d-4d48-affd-92a68ccbf645", "created": "2022-03-09T00:30:45.949462+00:00", "metadata": {"contentSize": 17741669409, "contentUrl": ["https://api.dandiarchive.org/api/assets/08be3c08-b57d-4d48-affd-92a68ccbf645/download/", "https://dandiarchive.s3.amazonaws.com/zarr/f7e3a560-c4a6-4652-b8c8-66afe580e4cb/"], "digest": {"dandi:dandi-zarr-checksum": "435b8a416fdc0cce5a9926e83521a2bc-97192--17741669409"}, "encodingFormat": "application/x-zarr", "id": "dandiasset:08be3c08-b57d-4d48-affd-92a68ccbf645", "identifier": "08be3c08-b57d-4d48-affd-92a68ccbf645", "path": "sub-MITU01/ses-20210720h20m19s32/micr/sub-MITU01_ses-20210720h20m19s32_sample-127_stain-YO_run-1_chunk-9_SPIM.ome.zarr"}, "modified": "2022-07-13T21:44:25.226338+00:00", "path": "sub-MITU01/ses-20210720h20m19s32/micr/sub-MITU01_ses-20210720h20m19s32_sample-127_stain-YO_run-1_chunk-9_SPIM.ome.zarr", "size": 17741669409, "zarr": "f7e3a560-c4a6-4652-b8c8-66afe580e4cb"},
  {"asset_id": "383ce132-b86c-40d5-805c-da8619432c96", "created": "2022-03-09T00:30:45.957436+00:00", "metadata": {"contentSize": 30451988299, "contentUrl": ["https://api.dandiarchive.org/api/assets/383ce132-b86c-40d5-805c-da8619432c96/download/", "https://dandiarchive.s3.amazonaws.com/zarr/66dbcc6f-0396-441d-aea6-0af8d5500562/"], "digest": {"dandi:dandi-zarr-checksum": "9d4f771efba59252b6364eceebcb084e-95764--30451988299"}, "encodingFormat": "application/x-zarr", "id": "dandiasset:383ce132-b86c-40d5-805c-da8619432c96", "identifier": "383ce132-b86c-40d5-805c-da8619432c96", "path": "sub-MITU01/ses-20210721h22m29s00/micr/sub-MITU01_ses-20210721h22m29s00_sample-128_stain-LEC_run-1_chunk-10_SPIM.ome.zarr"}, "modified": "2022-07-13T21:44:25.304184+00:00", "path": "sub-MITU01/ses-20210721h22m29s00/micr/sub-MITU01_ses-20210721h22m29s00_sample-128_stain-LEC_run-1_chunk-10_SPIM.ome.zarr", "size": 30451988299, "zarr": "66dbcc6f-0396-441d-aea6-0af8d5500562"},
yarikoptic commented 2 years ago

I think it is one of those cases where dandi-archive probably should catch exceptions and issue a warning but not cause total meltdown of the job. Such issues should be picked up by validation before dandiset published. And it is the case -- https://dandiarchive.org/dandiset/000108 has

image

will file a dedicated issue for duplicates filed https://github.com/dandi/dandi-archive/issues/1195 about missing paths for the assets

satra commented 2 years ago

i would still say this is a bug on the server side that an asset was created with a version.

dandibot commented 2 years ago

:rocket: Issue was released in v0.2.43 :rocket: