dandi / dandi-schema

Schemata for DANDI archive project
Apache License 2.0
7 stars 10 forks source link

DANDI-API-C7: aggregate_assets_summary_task: KeyError encodingFormat since Feb #183

Closed yarikoptic closed 1 year ago

yarikoptic commented 1 year ago

There was https://github.com/dandi/dandi-archive/pull/1490 which made such issues to bubble up, this is a list of sentry reports which seems keep mentioning this issue

image

which IMHO should be given priority to be addressed even if just not to pollute our mailboxes with alerts, thus possibly hiding other, may be more important issues.

sentry-io[bot] commented 1 year ago

Sentry issue: DANDI-API-C7

waxlamp commented 1 year ago

This seems to be a bug in dandischema, where the code assumes there is always an encodingFormat key. Can we move this issue to that repo and fix it at the source?

yarikoptic commented 1 year ago

I think so, transferred...

satra commented 1 year ago

we can do a validity check in dandischema itself, but i think the original general idea was that the input to that function should be valid assets and valid assets should always have an encodingFormat. and since validity is already included in the database, the aggregate function should run over valid assets (with invalid assets reported in the validation output).

on a fix side, every asset should have an encodingFormat, we won't drop that. however, we can add a validate option to the function.

danlamanna commented 1 year ago

we can do a validity check in dandischema itself, but i think the original general idea was that the input to that function should be valid assets and valid assets should always have an encodingFormat. and since validity is already included in the database, the aggregate function should run over valid assets (with invalid assets reported in the validation output).

In this case, I think this is a bug in https://github.com/dandi/dandi-archive. We're currently aggregating assets regardless of whether or not they're valid. I'd propose we close #184 since it's revealing a true bug in our usage, fix the aggregation on the dandi-archive side, and then recompute summaries for all draft versions.

satra commented 1 year ago

@danlamanna - that sounds good to me. mostly because the metadata is not valid till the archive completes checksum checks and also does not enforce valid schema on POST/PUT.