Closed hannes-ucsc closed 12 months ago
One of the affected biosamples from the anvil-it
catalog.
{
"entryId": "1b5294b0-ca95-402c-8b75-03e3aea7b66c",
"sources": [
{
"sourceSpec": "tdr:datarepo-dev-43738c90:snapshot/ANVIL_1000G_2019_Dev_20230302_ANV5_202303032342:/2",
"sourceId": "cc1c98a4-bfc4-45f2-b8dc-e920e5ca634d"
}
],
"bundles": [
{
"bundleUuid": "1b5294b0-ca95-a02c-8b75-03e3aea7b66c",
"bundleVersion": "2022-06-01T00:00:00.000000Z"
}
],
"activities": [
{
"activity_type": [
"Checksum",
"Indexing",
"Unknown"
],
"assay_type": [
null
],
"data_modality": [
null
]
}
],
"biosamples": [
{
"document_id": "1b5294b0-ca95-402c-8b75-03e3aea7b66c",
"source_datarepo_row_ids": [
"sample:e343379d-7eff-4df6-a4e1-4b3418f82008"
],
"biosample_id": "f3c8c3d5-ebab-71fe-fd58-9f69923d123b",
"anatomical_site": null,
"apriori_cell_type": [
null
],
"biosample_type": null,
"disease": null,
"donor_age_at_collection_unit": null,
"donor_age_at_collection": {
"gte": null,
"lte": null
},
"accessible": true
}
],
"datasets": [
{
"dataset_id": [
"385290c3-dff5-fb6d-2501-fa0ba3ad1c35"
],
"title": [
"ANVIL_1000G_2019_Dev"
]
}
],
"diagnoses": [],
"donors": [
{
"organism_type": [
null
],
"phenotypic_sex": [
null
],
"reported_ethnicity": [
null
],
"genetic_ancestry": [
null
]
}
],
"files": [
{
"data_modality": [
null
],
"file_format": [
".md5"
],
"reference_assembly": [
null
],
"is_supplementary": [
false,
true
],
"count": 2
},
{
"data_modality": [
null
],
"file_format": [
".cram"
],
"reference_assembly": [
null
],
"is_supplementary": [
false
],
"count": 1
},
{
"data_modality": [
null
],
"file_format": [
".crai"
],
"reference_assembly": [
null
],
"is_supplementary": [
false
],
"count": 1
}
]
}
Spike to diagnose.
Here is the complete structure of the bundle shown above. Non-supplementary files are in blue and the supplementary file is in red.
I suspect this is a bug in the snapshot because the two leaf files are the same format and derived from the same activity type, but one is marked as supplementary while the other is not.
Suggested workaround:
Subject: [PATCH] fix
---
Index: test/integration_test.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/test/integration_test.py b/test/integration_test.py
--- a/test/integration_test.py (revision c6d873bfe5ba641359a0598dca1e5c0dea66a2cc)
+++ b/test/integration_test.py (date 1684707202458)
@@ -966,7 +966,7 @@
for file in hit['files']:
is_supplementary = file['is_supplementary']
if isinstance(is_supplementary, list):
- is_supplementary = one(is_supplementary)
+ is_supplementary = all(is_supplementary)
if is_supplementary:
bundle_fqid['entity_type'] = BundleEntityType.supplementary.value
break
Broad confirmed that this is a bug in the snapshot.
https://ucsc-gi.slack.com/archives/C03TPJS54DC/p1684805634413329
No demo, passing IT (on PR #5184 for #5015) suffices.
The PR is just a workaround, we're still waiting for a fixed snapshot.
https://ucsc-gi.slack.com/archives/C03TPJS54DC/p1684805634413329
Furthermore, the workaround in PR #5231 is incomplete so working around this would require even more work.
ETA for the fixed snapshot is "early next week".
Snapshot has arrived and been verified to address the issue. Assignee to file a PR reverting the workaround and incorporating the snapshot.
Workaround has been removed.
https://gitlab.prod.anvil.gi.ucsc.edu/ucsc/azul/-/jobs/4281
for branch https://github.com/DataBiosphere/azul/tree/issues/hannes-ucsc/5015-anvilprod
and on commit https://github.com/DataBiosphere/azul/commit/4fe493e423aabc5859e4dff9c3019483ef3ec31d