Closed brianraymor closed 9 months ago
@brianraymor (cc: @atarashansky @dsadgat @joyceyan) - Will the schema CLI fail validation if it encounters values like the following. The list below is not exhaustive but giving examples of stuff that doesn't make sense:
unknown,HANCESTRO:0008
HANCESTRO:0008,unknown,HANCESTRO:0320
unknown,na
I ask because it affects implementation in gene expression application
Note: This is a placeholder epic for Dev to add child issues following an assessment of required changes.
See #single-cell-data-wrangling for discussion with @norbid about his preferences for how multiple self reported ethnicity values may be surfaced in the CELLxGENE Discover UX.
The changes to the cellxgene-schema CLI are tracked separately in cellxgene-schema must validate self_reported_ethnicity
Design
See schema 4
self_reported_ethnicity_ontology_term_id
str
categories. Iforganism_ontolology_term_id
is"NCBITaxon:9606"
for Homo sapiens, the value MUST be formatted as one or more comma-separated (with no leading or trailing spaces) HANCESTRO terms in ascending lexical order or"unknown"
if unavailable.For example, if the terms are
"HANCESTRO:0014
andHANCESTRO:0005"
then the value ofself_reported_ethnicity_ontology_term_id
MUST be"HANCESTRO:0005,HANCESTRO:0014"
.The following terms MUST NOT be used:
"HANCESTRO:0002"
for regions and its children"HANCESTRO:0003"
for country"HANCESTRO:0004"
for ancestry category"HANCESTRO:0018"
for uncategorised population"HANCESTRO:0290"
for genetically isolated population"HANCESTRO:0304"
for ancestry status and its children"HANCESTRO:0323"
for Finnish founder"HANCESTRO:0324"
for Dutch founder"HANCESTRO:0551"
for genetically homogenous Irish"HANCESTRO:0554"
for Silk Road founder"HANCESTRO:0555"
for Arab Israeli founder"HANCESTRO:0557"
for Costa Rican founder"HANCESTRO:0558"
for French Canadian founder"HANCESTRO:0559"
for Italian founder"HANCESTRO:0560"
for Northern Finnish founder"HANCESTRO:0561"
for Romanian founder"HANCESTRO:0564"
for Vis founder"HANCESTRO:0565"
for Split founder"HANCESTRO:0566"
for undefined ancestry population"GEO:000000374"
for continent and its children:"HANCESTRO:0029"
for Africa"HANCESTRO:0030"
for Asia"HANCESTRO:0031"
for Europe"HANCESTRO:0032"
for Oceania"HANCESTRO:0033"
for Latin America and the Caribbean"HANCESTRO:0034"
for Northern AmericaOtherwise, for all other organisms the
str
value MUST be"na"
.self_reported_ethnicity
str
categories. This MUST be"na"
or"unknown"
if set inself_reported_ethnicity_ontology_term_id
; otherwise, this MUST be one or more comma-separated (with no leading or trailing spaces) human-readable names for the terms inself_reported_ethnicity_ontology_term_id
in the same order.For example, if the value of
self_reported_ethnicity_ontology_term_id
is"HANCESTRO:0005,HANCESTRO:0014"
then the value ofself_reported_ethnicity
is"European,Hispanic or Latin American"
.Samples
Sample normalized terms
Sample labels for terms
Data Platform
Data Portal API changes
There should be no required changes to the current implementation. Based on the dataset samples above, the response value for
self_reported_ethnicity
would be:Discover API changes
There should be no required changes to the current implementation. Based on the dataset samples above, the response value for
self_reported_ethnicity
would be:Discover UX filter changes
Refined into Self-Reported Ethnicity filter in Collections and Datasets must be updated
Data Viz
Census