Closed dsotirho-ucsc closed 9 months ago
Checked for all the four stratification values in this snapshot.
Query:
-- Donor genus_species
SELECT ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.ontology_label') FROM UNNEST(JSON_EXTRACT_ARRAY(content, '$.genus_species')) AS x) AS ontology_label,
ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.text') FROM UNNEST(JSON_EXTRACT_ARRAY(content, '$.genus_species')) AS x) AS text,
ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.ontology') FROM UNNEST(JSON_EXTRACT_ARRAY(content, '$.genus_species')) AS x) AS ontology,
FROM `datarepo-14448a21.lungmap_prod_6135382f487d4adb9cf84d6634125b68__20230207_20230207_lm3.specimen_from_organism`
Result: OK 148 rows all with {"ontology_label": "Homo sapiens", "text': "Homo sapiens", "ontology": "NCBITaxon:9606"}
Query:
-- Donor development_stage
select JSON_VALUE(content, '$.development_stage.ontology_label') AS ontology_label,
JSON_VALUE(content, '$.development_stage.text') AS text,
JSON_VALUE(content, '$.development_stage.ontology') AS ontology
FROM `datarepo-14448a21.lungmap_prod_6135382f487d4adb9cf84d6634125b68__20230207_20230207_lm3.donor_organism`
Result: NOT OK, Empty strings found 104 rows all with {"text': ""}
Query:
-- Specimen_from_organism organ
SELECT JSON_VALUE(content, '$.organ.ontology_label') AS ontology_label,
JSON_VALUE(content, '$.organ.text') AS text,
JSON_VALUE(content, '$.organ.ontology') AS ontology
FROM `datarepo-14448a21.lungmap_prod_6135382f487d4adb9cf84d6634125b68__20230207_20230207_lm3.specimen_from_organism`
Result: NOT OK, Empty strings found 146 rows with {"ontology_label": "Lung", "text': "Lung", "ontology": "UBERON:0002048"} 2 rows with {"text': ""}
Query:
-- Cell_line model_organ
SELECT content
FROM `datarepo-14448a21.lungmap_prod_6135382f487d4adb9cf84d6634125b68__20230207_20230207_lm3.cell_line`
Result: OK (no rows)
Query:
-- Organoid model_organ
SELECT content
FROM `datarepo-14448a21.lungmap_prod_6135382f487d4adb9cf84d6634125b68__20230207_20230207_lm3.organoid`
Result: OK (no rows)
Query:
-- Library_preparation_protocol library_construction_method
SELECT JSON_VALUE(content, '$.library_construction_method.ontology_label') AS ontology_label,
JSON_VALUE(content, '$.library_construction_method.text') AS text,
JSON_VALUE(content, '$.library_construction_method.ontology') AS ontology
FROM `datarepo-14448a21.lungmap_prod_6135382f487d4adb9cf84d6634125b68__20230207_20230207_lm3.library_preparation_protocol`
Result: OK 10 rows with varying values e.g. {"ontology_label": "10X sequencing", "text': "10x 3' v2 and v3 sequencing", "ontology": "EFO:0008995"} and {"ontology_label": "10X 3' v2 sequencing", "text": "10X 3' v2 sequencing", "ontology": "EFO:0009899"}
Spike to do the same check in the other new snapshot in lm3
No issues found with snapshot datarepo-d139f96d.lungmap_prod_1bdcecde16be420888f478cd2133d11d__20220308_20230207_lm3
Query:
-- Donor genus_species
SELECT ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.ontology_label') FROM UNNEST(JSON_EXTRACT_ARRAY(content, '$.genus_species')) AS x) AS ontology_label,
ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.text') FROM UNNEST(JSON_EXTRACT_ARRAY(content, '$.genus_species')) AS x) AS text,
ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.ontology') FROM UNNEST(JSON_EXTRACT_ARRAY(content, '$.genus_species')) AS x) AS ontology,
FROM `datarepo-d139f96d.lungmap_prod_1bdcecde16be420888f478cd2133d11d__20220308_20230207_lm3.specimen_from_organism`
Result: OK, 16 rows, no empty string values
[{
"ontology_label": ["Mus musculus"],
"text": ["Mus musculus"],
"ontology": ["NCBITaxon:10090"]
}, {
"ontology_label": ["Mus musculus"],
"text": ["Mus musculus"],
"ontology": ["NCBITaxon:10090"]
}, {
"ontology_label": ["Mus musculus"],
"text": ["Mus musculus"],
"ontology": ["NCBITaxon:10090"]
}, {
…
Query:
-- Donor development_stage
select JSON_VALUE(content, '$.development_stage.ontology_label') AS ontology_label,
JSON_VALUE(content, '$.development_stage.text') AS text,
JSON_VALUE(content, '$.development_stage.ontology') AS ontology
FROM `datarepo-d139f96d.lungmap_prod_1bdcecde16be420888f478cd2133d11d__20220308_20230207_lm3.donor_organism`
Result: OK, 16 rows, no empty string values
[{
"ontology_label": "mouse postnatal",
"text": "mouse postnatal",
"ontology": "EFO:0004390"
}, {
"ontology_label": "mouse postnatal",
"text": "mouse postnatal",
"ontology": "EFO:0004390"
}, {
"ontology_label": "mouse postnatal",
"text": "mouse postnatal",
"ontology": "EFO:0004390"
}, {
…
Query:
-- Specimen_from_organism organ
SELECT JSON_VALUE(content, '$.organ.ontology_label') AS ontology_label,
JSON_VALUE(content, '$.organ.text') AS text,
JSON_VALUE(content, '$.organ.ontology') AS ontology
FROM `datarepo-d139f96d.lungmap_prod_1bdcecde16be420888f478cd2133d11d__20220308_20230207_lm3.specimen_from_organism`
Result: OK, 16 rows, no empty string values
[{
"ontology_label": "pair of lungs",
"text": "pair of lungs",
"ontology": "UBERON:0000170"
}, {
"ontology_label": "pair of lungs",
"text": "pair of lungs",
"ontology": "UBERON:0000170"
}, {
"ontology_label": "pair of lungs",
"text": "pair of lungs",
"ontology": "UBERON:0000170"
}, {
…
Query:
-- Cell_line model_organ
SELECT content
FROM `datarepo-d139f96d.lungmap_prod_1bdcecde16be420888f478cd2133d11d__20220308_20230207_lm3.cell_line`
Result: OK (no rows)
Query:
-- Organoid model_organ
SELECT content
FROM `datarepo-d139f96d.lungmap_prod_1bdcecde16be420888f478cd2133d11d__20220308_20230207_lm3.organoid`
Result: OK (no rows)
Query:
-- Library_preparation_protocol library_construction_method
SELECT JSON_VALUE(content, '$.library_construction_method.ontology_label') AS ontology_label,
JSON_VALUE(content, '$.library_construction_method.text') AS text,
JSON_VALUE(content, '$.library_construction_method.ontology') AS ontology
FROM `datarepo-d139f96d.lungmap_prod_1bdcecde16be420888f478cd2133d11d__20220308_20230207_lm3.library_preparation_protocol`
Result: OK, 1 row, no empty string values
[{
"ontology_label": "Drop-seq",
"text": "Drop-seq",
"ontology": "EFO:0008722"
}]
Still waiting for LungMAP people to respond to our question on Slack.
The question was answered and several updated snapshots to lm3 have been released: 31550585 3069ed8a 3e425d4f 33636421 e93cc9dd e43b5aff b67e28ae 15c60004
One of these apparently addresses the issue, though I don't have time to figure out which one.
The following resolves the issue for donor.development_stage:
However tests with the
lm3
catalog caused failure due to other fields (sample.organ
) as well, so something like this is probably needed instead: