cdd / bioassay-template

Other
7 stars 2 forks source link

Dissect the CLO and include new terms #101

Closed aclarkxyz closed 6 years ago

aclarkxyz commented 6 years ago

This project currently uses a subset of CLO (Cell Line Ontology) terms that were selected for inclusion in the BAO. The total number of terms is considerably larger, and this has become a problem for importing more recent data.

Since ontology duplicate triples are not a problem, it's OK to include a larger set of content with its subset: this does not cause a conflict. So we would like to have the latest from the CLO GitHub project:

https://github.com/CLO-ontology/CLO/tree/master/src/ontology

Either the clo.owl or clo_merged.owl file will have the content (whichever is more convenient to work with). That file has a lot of content that we do not want to import, so what's needed is a script that looks for CLO_0000001 ("cell line cell") as the root node, and appends everything that is descended from it. The newly extracted tree should consist of RDF triples that define {uri, label, description} and uses {subClassOf} to define the tree. (Everything else can be discarded.)

The subset can be built easily using code borrowed from ModelSchema.java, and written out using the Jena libraries. The output should be something like CLO_cells.ttl, which can be copied into ${BAX}/data/ontology. This should cause the template editor to discover a significantly larger number of cells.

kodecharlie commented 6 years ago

Fixed, merged to master.