althonos / pronto

A Python frontend to (Open Biomedical) Ontologies.
https://pronto.readthedocs.io
MIT License
228 stars 47 forks source link

Getting Relationships #174

Open krishanudb opened 2 years ago

krishanudb commented 2 years ago

Hi. I have used pronto for the last few months. First of all, kudos to the coders, it's great work.

Coming to the issue I am facing, I initially thought that pronto is able to extract all the relationships mentioned in an owl file, no matter how convoluted (read reified) the relations are. But then, I recently understood that there are some limitations.

For example, while parsing Disease Ontology OWL file, I saw that some relations were extracted while others were not: For the following class:

<!-- http://purl.obolibrary.org/obo/DOID_11573 -->

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/DOID_11573">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/DOID_0050338"/>
        <rdfs:subClassOf>
            **<owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/IDO_0000664"/>
                <owl:someValuesFrom>
                    <owl:Class>
                        <owl:unionOf rdf:parseType="Collection">
                            <rdf:Description rdf:about="http://purl.obolibrary.org/obo/NCBITaxon_1637"/>
                            <rdf:Description rdf:about="http://purl.obolibrary.org/obo/NCBITaxon_1639"/>
                        </owl:unionOf>
                    </owl:Class>
                </owl:someValuesFrom>
            </owl:Restriction>**
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002451"/>
                <owl:someValuesFrom>
                    <owl:Class>
                        <owl:unionOf rdf:parseType="Collection">
                            <rdf:Description rdf:about="http://purl.obolibrary.org/obo/TRANS_0000006"/>
                            <rdf:Description rdf:about="http://purl.obolibrary.org/obo/TRANS_0000012"/>
                        </owl:unionOf>
                    </owl:Class>
                </owl:someValuesFrom>
            </owl:Restriction>
        </rdfs:subClassOf>
        **<rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002452"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/SYMP_0000458"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002452"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/SYMP_0000570"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002452"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/SYMP_0019145"/>
            </owl:Restriction>
        </rdfs:subClassOf>**
        <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">A primary bacterial infectious disease that results_in infection, has_material_basis_in Listeria monocytogenes, which is transmitted_by ingestion of contaminated food or raw milk or transmitted_by congenital method. Ingestion of Listeria by pregnant women has_symptom nausea, has_symptom vomiting, has_symptom diarrhea, has_symptom fever, has_symptom malaise, has_symptom back pain, and has_symptom headache. Maternal infection with Listeria can result in chorioamnionitis, premature labor, spontaneous abortion, or stillbirth.</obo:IAO_0000115>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ICD10CM:A32</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ICD9CM:027.0</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">MESH:D008088</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">NCI:C82994</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">SNOMEDCT_US_2021_03_01:186315001</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">UMLS_CUI:C0023860</oboInOwl:hasDbXref>
        <oboInOwl:hasExactSynonym xml:lang="en">Infection by Listeria monocytogenes</oboInOwl:hasExactSynonym>
        <oboInOwl:hasExactSynonym xml:lang="en">Listeria infection</oboInOwl:hasExactSynonym>
        <oboInOwl:hasOBONamespace rdf:datatype="http://www.w3.org/2001/XMLSchema#string">disease_ontology</oboInOwl:hasOBONamespace>
        <oboInOwl:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">DOID:11573</oboInOwl:id>
        <oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/doid#NCIthesaurus"/>
        <oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/doid#gram-positive_bacterial_infectious_disease"/>
        <oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/doid#zoonotic_infectious_disease"/>
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">listeriosis</rdfs:label>
    </owl:Class>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/DOID_11573"/>
        <owl:annotatedProperty rdf:resource="http://purl.obolibrary.org/obo/IAO_0000115"/>
        <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">A primary bacterial infectious disease that results_in infection, has_material_basis_in Listeria monocytogenes, which is transmitted_by ingestion of contaminated food or raw milk or transmitted_by congenital method. Ingestion of Listeria by pregnant women has_symptom nausea, has_symptom vomiting, has_symptom diarrhea, has_symptom fever, has_symptom malaise, has_symptom back pain, and has_symptom headache. Maternal infection with Listeria can result in chorioamnionitis, premature labor, spontaneous abortion, or stillbirth.</owl:annotatedTarget>
        <dc:type rdf:resource="http://purl.obolibrary.org/obo/ECO_0007640"/>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">url:http://www.nlm.nih.gov/medlineplus/ency/article/001380.htm</oboInOwl:hasDbXref>
    </owl:Axiom>

pronto was not able to extract these relations among others: IDO_0000664 NCBITaxon_1637 IDO_0000664 NCBITaxon_1639

but pronto was able to extract these relations: RO_0002452 SYMP_0000458 RO_0002452 SYMP_0000570 RO_0002452 SYMP_0019145

Can anyone point me to method/strategies using which I can extract all of these relations? Thanks, Krishanu

cmungall commented 2 years ago

I think it's expected behavior that some axioms that do not conform to the OBO Profile of OWL are not translated

For example, this axiom:

transmitted by some (vehicle-borne ingestion transmission or congenital transmission)

involves a union on the RHS. This just doesn't fit into the datamodel. While you can imagine pronto translating this to two relationships, this would actually be dangerous and wrong. Not all listerioses have a congenital origin.

What is your use case? What would you hope to do with the axioms once translated into a python datamodel?

Also, when thinking about these kinds of things it helps to think in terms of "Axioms" when talking about OWL, and "relationships" in an OBO/graph-like object model. Sometimes there is a mapping between these concepts, sometimes not.

krishanudb commented 2 years ago

Hi. Thanks a lot for your reply. Your explanation makes perfect sense. Actually I want to create a knowledge graph from these axioms/relations, plus other sources, and use it for further downstream tasks. However, in my data model, there is no scope for reification. So I wanted to flatten these relations in some way into (simple) triples..

cmungall commented 2 years ago

just to be clear, the example here doesn't involve reification at all.

I think when translating to a KG it makes sense to drop edges with unions. You count turn the blank node into a node in your graph, and then use a reasoner to make is-a links. e.g

  1. DOID_11573 transmitted-by uuid1
  2. uuid1 is-a transmission-process
  3. vehicle-borne ingestion transmission is-a uuid1
  4. congenital transmission is-a uuid1

but tbh I think the value is a little minimal

I have been attempting to gather interest from the broader community about a standard way of mapping from OWL to KGs, see https://github.com/cmungall/owlstar -- this is pretty much a format-independent generalization of what pronto already does with OWL

You may also be interested in https://github.com/biolink/kgx

We have made kgx files for all of the OBO ontologies, see https://github.com/Knowledge-Graph-Hub/kg-obo