chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
38 stars 24 forks source link

Evolve development_stage_ontology_term_id to support multiple species #1033

Open brianraymor opened 1 month ago

brianraymor commented 1 month ago

Per November 13 2024 call with @ambrosejcarr, @jahilton, @BAevermann, @SESDNA :

  1. CELLxGENE schema will continue to use taxon specific development stage ontologies when available; otherwise, will default to UBERON.
  2. STRONGLY RECOMMENDED development stage terms will be removed from the schema.

Available from developmental-stage-ontologies

prefix namespace format
acardv (AcarDv) lizard owl
btaudv (BtauDv) cow owl
cfamdv (CfamDv) dog owl
cpordv (CporDv) cavy (Caviidae) owl
danadv (DanaDv) Drosophila ananassae obo
dmojdv (DmojDv) Drosophila mojavensis obo
dpsedv (DpseDv) Drosophila pseudobscura obo
dsimdv (DsimDv) Drosophila simulans owl
dvirdv (DvirDv) Drosophila virilis obo
dyakdv (DyakDv) Drosophila yakuba obo
ecabdv (EcabDv) Horse owl
eeurdv (EeurDv) Hedgehog obo
fcatdv (FcatDv) Cat owl
ggaldv (GgalDv) Chicken owl
ggordv (GgorDv) Gorilla owl
mdomdv (MdomDv) Opossum owl
metadv - -
mmuldv (MmulDv) Rhesus Macaque owl
oanadv (OanaDv) Platypus owl
oaridv (OariDv) Sheep owl
ocundv (OcunDv) Rabbit owl
olatdv (OlatDv) Medaka
adapted from MFO
by Thorsten Henrich
owl
pdumdv (PdumDv) Platynereis owl
ppandv (PpanDv) Bonobo owl
ppygdv (PpygDv) Orangutan obo
ptrodv (PtroDv) Chimpanzee owl
rnordv (RnorDv) Rat owl
ssaldv (SsalDv) Atlantic Salmon obo
sscrdv (SscrDv) Pig owl
tnigdv (TnigDv) Pufferfish obo not in release
BAevermann commented 1 month ago

For cases where there is a species specific development stages ontology, why not consider there usage as "REQUIRED"? I am specifically thinking about human and mouse as the terms in UBERON are clear downgrade as compared to the curation currently available.

brianraymor commented 1 month ago

We depend on the kindness of curators to define the most accurate development stage terms. For example, the schema only requires

If organism_ontolology_term_id is "NCBITaxon:9606" for Homo sapiens, this MUST be the most accurate descendant of HsapDv:0000001 for life cycle with the following STRONGLY RECOMMENDED: ... followed by a list of HsapDv terms.

There's nothing preventing a submitter from selecting a high-level HsapDv term such as embyronic stage.

Further, the development stage ontologies duplicate the UBERON high-level hierarchical terms for stages such as blastula stage. For example, HsapDv vs UBERON.

The schema could certainly define tables per species with REQUIRED and STRONGLY RECOMMENDED UBERON and species specific ontology terms.

For Use
UBERON stage A term from the set of Carnegie stages 1-23
(up to 8 weeks after conception; e.g. HsapDv:0000003)
UBERON stage A term from the set of 9 to 38 week post-fertilization human stages
(9 weeks after conception and before birth; e.g. HsapDv:0000046)
      <br>

If @jahilton and @jychien believe that we could strengthen the requirements for development stages to block high-level stages, then that's another possibility - MUST USE A term from the set of Carnegie stages 1-23

Currently, we're in the middle of the multiple species and relaxed schema experiment - but if multiple species begin to surface in the CELLxGENE Discover UX, then I'd expect that @niknak33 and @hthomas-czi may prefer to simplify the Development Stages UX Filter to be species neutral and rely more on the UBERON terms. The current design was based on constraints that are no longer valid.

jahilton commented 1 month ago

I would support requiring the species-specific Dv ontology to be used, like we currently do for human & mouse, "For cases where species specific development stages ontologies...exist". I don't see any reason to allow an UBERON term in those cases.