DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Report manifest dimensions in final reponse #6506

Open hannes-ucsc opened 3 months ago

hannes-ucsc commented 3 months ago

https://ucsc-gi.slack.com/archives/C03TPJS54DC/p1723839714077969?thread_ts=1723581391.515569&cid=C03TPJS54DC

hannes-ucsc commented 3 months ago

Spec is WIP. At the moment it's only clear that we would need to report the number of rows of the biggest table (largest number of entities of any given type) of a verbatim.pfb manifest, but we should include any other easily determinable metric such as the overall size of the manifest, the number of tables (entity types) and the overall number of rows (sum of the number of entities). For verbatim.jsonl the same metrics apply. For compact manifests we should report the number of rows and columns and the overall size.

See also #5544, which we could solve at the same time.

dsotirho-ucsc commented 3 months ago

Assignee to consider next steps.

hannes-ucsc commented 2 months ago

Can't move forward until we have certainty about the desired implementation. See https://github.com/DataBiosphere/data-browser/issues/4116#issuecomment-2333232597