althonos / pronto

A Python frontend to (Open Biomedical) Ontologies.
https://pronto.readthedocs.io
MIT License
229 stars 48 forks source link

It should be possible to filter out relationships based on GCIs #163

Open cmungall opened 2 years ago

cmungall commented 2 years ago

This sounds quite obscure but is in fact extremely important to avoid erroneous propagation

consider

[Term]
id: UBERON:0018140
name: mammary lobe
is_a: UBERON:0009912 ! anatomical lobe
relationship: part_of UBERON:0000310 {gci_relation="part_of", gci_filler="NCBITaxon:9606"} ! breast
relationship: part_of UBERON:0001911 ! mammary gland

this means that in the context of humans the mammary lobe is part of the breast. (in e.g. mice, there are mammary glands in other locations)

The OWL translation is here https://owlcollab.github.io/oboformat/doc/obo-syntax.html#5.2.2

In OWL the meaning is very clear but there is a lot of possibility for mistranslation in obo when people ignore this field

it is important that applications are able to filter these relationships as naive traversal/propagation should not be used if the context is not satisfied

OBO Basic (https://owlcollab.github.io/oboformat/doc/obo-syntax.html#6.2) explicitly forbids GCIs due to possibility of mis-use

pronto is very useful for using with non-basic ontologies, but I don't see a way to filter on this?

althonos commented 2 years ago

Hi Chris,

the problem with GCI is that they end up in OBO qualifiers at the moment, which are parsed by the fastobo Rust parser, but dropped when going from the Rust data structures to the Python data structures (I mean, the specs say it's supposed to be fine to ignore qualifiers, so.....). I'll have to find some time to update fastobo-py to support getting qualifiers+comments from the clause lines into the Python data structures first to get that working.