Azul only includes orphans in a verbatim manifest when the sole filter is datasets.dataset_id. Currently, the Data Browser filters by datasets.title. The dataset title is an unreliable filter because it is not guaranteed to be unique.
The Data Browser also unnecessarily specifies filters for donor and organism types even if the user selects all possible types, in which case the filters are redundant. Filtering by every possible value of a facet is equivalent to not filtering by that facet at all.
These two issues defeat Azul's detection of the fact that a manifest for an entire dataset is being requested, and causes it to exclude orphans from that manifest.
For example, the manifest request currently made by the Data Browser is
Azul only includes orphans in a verbatim manifest when the sole filter is
datasets.dataset_id
. Currently, the Data Browser filters bydatasets.title
. The dataset title is an unreliable filter because it is not guaranteed to be unique.The Data Browser also unnecessarily specifies filters for donor and organism types even if the user selects all possible types, in which case the filters are redundant. Filtering by every possible value of a facet is equivalent to not filtering by that facet at all.
These two issues defeat Azul's detection of the fact that a manifest for an entire dataset is being requested, and causes it to exclude orphans from that manifest.
For example, the manifest request currently made by the Data Browser is
https://service.anvil.gi.ucsc.edu/fetch/manifest/files?catalog=anvil&filters={"datasets.title":{"is":["ANVIL_1000G_2019_Dev"]},"donors.organism_type":{"is":[null]},"files.file_format":{"is":[".md5",".tbi",".vcf.gz",".crai",".cram",".txt"]}}&format=verbatim.pfb
In order to include orphans, that request must be just
https://service.anvil.gi.ucsc.edu/fetch/manifest/files?catalog=anvil&filters={"datasets.dataset_id":{"is":["677dd55c-3fa3-4b07-8c98-985d94d7577e"]}}&format=verbatim.pfb