Closed atarashansky closed 6 months ago
Let's add another mapping table to census["info"]
with the following schema specification:
census_obj["census_info"]["organisms"]
– SOMADataFrame
Information about organisms whose cells are included in the Census MUST be included in a table modeled as a SOMADataFrame
. Each row MUST correspond to an individual organism with the following columns:
Column | Encoding | Description |
---|---|---|
organism_ontology_term_id | string | As defined in the CELLxGENE dataset schema. |
organism_label | string | Human-readable label as given by the ontology. |
organism | string | Machine-friendly label used to name the SOMA Experiments, see below Census Data section. |
An example of this SOMADataFrame
is shown below:
organism_ontology_term_id | organism_label | organism |
---|---|---|
NCBITaxon:9606 | Homo sapiens | homo_sapiens |
NCBITaxon:10090 | Mus musculus | mus_musculus |
Description
It would be convenient if Census contained the organism ontology term ID in the corresponding organism's census object.
Context
I am building the WMG snapshot from census and need to maintain a mapping between census organism keys and their ontology term IDs (WMG snapshot requires the term IDs, not the labels):
{'homo_sapiens': 'NCBITaxon:9606', 'mus_musculus': 'NCBITaxon:10090'}
Impact
It is inconvenient to need to maintain a separate mapping table, especially if we add support for more organisms.