DataBiosphere / data-browser

Apache License 2.0
11 stars 4 forks source link

Add support for inclusion of orphans in verbatim PFB #4254

Open hannes-ucsc opened 6 days ago

hannes-ucsc commented 6 days ago

Azul only includes orphans in a verbatim manifest when the sole filter is datasets.dataset_id. Currently, the Data Browser filters by datasets.title. The dataset title is an unreliable filter because it is not guaranteed to be unique.

The Data Browser also unnecessarily specifies filters for donor and organism types even if the user selects all possible types, in which case the filters are redundant. Filtering by every possible value of a facet is equivalent to not filtering by that facet at all.

These two issues defeat Azul's detection of the fact that a manifest for an entire dataset is being requested, and causes it to exclude orphans from that manifest.

For example, the manifest request currently made by the Data Browser is

https://service.anvil.gi.ucsc.edu/fetch/manifest/files?catalog=anvil&filters={"datasets.title":{"is":["ANVIL_1000G_2019_Dev"]},"donors.organism_type":{"is":[null]},"files.file_format":{"is":[".md5",".tbi",".vcf.gz",".crai",".cram",".txt"]}}&format=verbatim.pfb

In order to include orphans, that request must be just

https://service.anvil.gi.ucsc.edu/fetch/manifest/files?catalog=anvil&filters={"datasets.dataset_id":{"is":["677dd55c-3fa3-4b07-8c98-985d94d7577e"]}}&format=verbatim.pfb

NoopDog commented 4 days ago

Thanks, will try to get this out this week. @hannes-ucsc @bvizzier-ucsc. This is assigned and in progress.