chanzuckerberg / single-cell-curation

Code and documentation for the curation of cellxgene datasets
MIT License
37 stars 23 forks source link

cellxgene-schema must download and process pinned ontologies #254

Closed brianraymor closed 2 years ago

brianraymor commented 2 years ago

Note: HsapDv and MmusDV releases are neither documented nor tagged in their repositories.

Their downloads include release metadata:

<owl:versionIRI rdf:resource="http://purl.obolibrary.org/obo/mmusdv/releases/2020-03-10/mmusdv.owl"/>

`

`

Both match the current pinned versions in the schema so these ontologies do not need to be refreshed.


Required Ontologies

The following ontology dependencies are pinned for this version of the schema.

Ontology OBO Prefix Release Download
Cell Ontology CL 2022-06-18 cl.owl
Experimental Factor Ontology EFO 2022-07-18 EFO 3.44.0 efo.owl
Human Ancestry Ontology HANCESTRO 2022-07-18 (2.6) hancestro.owl
Human Developmental Stages HsapDv 2020-03-10 hsapdv.owl
Mondo Disease Ontology MONDO 2022-07-01 mondo.owl
Mouse Developmental Stages MmusDv 2020-03-10 mmusdv.owl
NCBI organismal classification NCBITaxon 2022-06-28 ncbitaxon.owl
Phenotype And Trait Ontology PATO 2022-07-21 pato.owl
Uberon multi-species anatomy ontology UBERON 2022-06-30 uberon.owl
jychien commented 2 years ago
I've tested using the following cases, looks good to me! Test Case Expected Result Obtained Result Test File Needs attention
have ontology terms that used to be in cxg but is no longer valid in pinned ontologies Fail validation Failed validation
Logging: ERROR: 'CL:0008029' in 'cell_type_ontology_term_id' is a deprecated term id of 'CL'.
adata_ontologies_exp.h5ad
have an ontology terms that have updated term names (CL:0000653, podocyte) Pass validation
run --add-labels
Passed validation, term name was the updated name of "podocyte" adata_ontologies_updated.h5ad
adata_ontologies_updated_labeled.h5ad
have ontology terms that are new to pinned ontologies (EFO:0030062, Slide-seqV2) Pass validation Passed validation adata_ontologies_new.h5ad
have disease term that is new (MONDO:0800029) in MONDO Pass validation Passed validation adata_ontologies_mondo_new.h5ad
have disease term that is deprecated (MONDO:0008345) Fail validation Failed validation
Logging: ERROR: 'MONDO:0008345' in 'disease_ontology_term_id' is a deprecated term id of 'MONDO'. Only 'PATO:0000461' is allowed for 'PATO' term ids.
adata_ontologies_mondo_exp.h5ad
have disease terms that is new (UBERON:8410081) in UBERON Pass validation Passed validation adata_ontologies_uberon.h5ad