Open brianraymor opened 1 week ago
@brianraymor Anndata doesn't seem to support allowing multiple data types in a single column. What do you think of changing the schema so that when organism is not homo sapiens, we require that the value is float('nan')
instead of a string "na"
?
Changelog
genetic_ancestry_African
genetic_ancestry_East_Asian
genetic_ancestry_European
genetic_ancestry_Indigenous_American
genetic_ancestry_Oceanian
genetic_ancestry_South_Asian
Design
If
organism_ontolology_term_id
is"NCBITaxon:9606"
for Homo sapiens, then for each observation for the following fields, either all their values must befloat("nan")
or the sum of their values MUST be1.0
:genetic_ancestry_African
genetic_ancestry_East_Asian
genetic_ancestry_European
genetic_ancestry_Indigenous_American
genetic_ancestry_Oceanian
genetic_ancestry_South_Asian
genetic_ancestry_African
str
orfloat
. All observations with the samedonor_id
MUST contain the same value.If
organism_ontolology_term_id
is NOT"NCBITaxon:9606"
for Homo sapiens, then the value MUST be"na"
.If
organism_ontolology_term_id
is"NCBITaxon:9606"
for Homo sapiens, then the value MUST be afloat("nan")
if unavailable; otherwise, the value MUST be the genetic ancestry percentage of"HANCESTRO:0010"
for African expressed as afloat
greater than or equal to0.0
and less than or equal to1.0
genetic_ancestry_East_Asian
str
orfloat
. All observations with the samedonor_id
MUST contain the same value.If
organism_ontolology_term_id
is NOT"NCBITaxon:9606"
for Homo sapiens, then the value MUST be"na"
.If
organism_ontolology_term_id
is"NCBITaxon:9606"
for Homo sapiens, then the value MUST be afloat("nan")
if unavailable; otherwise, the value MUST be the genetic ancestry percentage of"HANCESTRO:0009"
for East Asian expressed as afloat
greater than or equal to0.0
and less than or equal to1.0
genetic_ancestry_European
str
orfloat
. All observations with the samedonor_id
MUST contain the same value.If
organism_ontolology_term_id
is NOT"NCBITaxon:9606"
for Homo sapiens, then the value MUST be"na"
.If
organism_ontolology_term_id
is"NCBITaxon:9606"
for Homo sapiens, then the value MUST be afloat("nan")
if unavailable; otherwise, the value MUST be the genetic ancestry percentage of"HANCESTRO:0005"
for European expressed as afloat
greater than or equal to0.0
and less than or equal to1.0
genetic_ancestry_Indigenous_American
str
orfloat
. All observations with the samedonor_id
MUST contain the same value.If
organism_ontolology_term_id
is NOT"NCBITaxon:9606"
for Homo sapiens, then the value MUST be"na"
.If
organism_ontolology_term_id
is"NCBITaxon:9606"
for Homo sapiens, then the value MUST be afloat("nan")
if unavailable; otherwise, the value MUST be the genetic ancestry percentage of"HANCESTRO:0013"
for Indigenous American expressed as afloat
greater than or equal to0.0
and less than or equal to1.0
genetic_ancestry_Oceanian
str
orfloat
. All observations with the samedonor_id
MUST contain the same value.If
organism_ontolology_term_id
is NOT"NCBITaxon:9606"
for Homo sapiens, then the value MUST be"na"
.If
organism_ontolology_term_id
is"NCBITaxon:9606"
for Homo sapiens, then the value MUST be afloat("nan")
if unavailable; otherwise, the value MUST be the genetic ancestry percentage of"HANCESTRO:0017"
for Oceanian expressed as afloat
greater than or equal to0.0
and less than or equal to1.0
genetic_ancestry_South_Asian
str
orfloat
. All observations with the samedonor_id
MUST contain the same value.If
organism_ontolology_term_id
is NOT"NCBITaxon:9606"
for Homo sapiens, then the value MUST be"na"
.If
organism_ontolology_term_id
is"NCBITaxon:9606"
for Homo sapiens, then the value MUST be afloat("nan")
if unavailable; otherwise, the value MUST be the genetic ancestry percentage of"HANCESTRO:0006"
for South Asian expressed as afloat
greater than or equal to0.0
and less than or equal to1.0