chanzuckerberg / cellxgene-ontology-guide

MIT License
2 stars 0 forks source link

Review file formats and ontology variants (can JSON replace OWL) #203

Open brianraymor opened 2 months ago

brianraymor commented 2 months ago

Questions to answer

Also see cell-science-platform.

Is is possible for COG to eliminate dependencies on owlready2 and transition to JSON?

Note: Early prototyping with cl-simple.json has demonstrated positive results in reproducing COG responses. Will add examples later.

Best practices

Tool developers developing tools that use the ontology (and do not need reasoners), such as database curation tools, web-browsers and similar, should typically use OBO graphs JSON and avoid using OBO format or any of the OWL focussed serialisations (Functional, Manchester or RDF/XML). OWL-focussed serialisations contain a huge deal of axiomatic content that make no sense to most users, and can lead to a variety of mistakes. We have seen it many times that software developers try to interpret OWL axioms somehow to extract relations. Do not do that! Work with the ontologies to ensure they provide the relationships you need in the appropriate form.

Also see developer-friendly JSON exchange format for ontologies

Current state of JSON support in required ontologies

Ontology JSON
Cell Ontology Y
Experimental Factor Ontology Y
Human Ancestry Ontology Y
Human Developmental Stages N
Mondo Disease Ontology Y
Mouse Developmental Stages N
NCBI organismal classification Y
Phenotype And Trait Ontology Y
Uberon multi-species anatomy ontology Y

robot convert

There is also the potential to generate missing JSON. See robot convert:

In the following example we convert an input ontology to OBOGraphs JSON, explicitly specifying the target format with --format:

robot convert -i ro-base.owl --format json -o results/ro-base.json

Can a less complex variant of an ontology be specified?

Release Artifacts Variants

* Simple: A version of the ontology that only contains only a subset of the ontology (only the direct relations, see docs). The simple variant should be used by most users that build tools that use the ontology, especially when serialised as OBO graphs json. This variant should probably be avoided by power-users working with reasoners, as many of the axioms that drive reasoning are missing.

Can an ontology be partially processed?

We could partially consume/process a set of ontologies like EFO or PATO where a subset of terms is important to CELLxGENE - basically, here's the preferred root - parse all the terms under "experimental process" in EFO.

This would reduce processing time and library size.