althonos / pronto

A Python frontend to (Open Biomedical) Ontologies.
https://pronto.readthedocs.io
MIT License
226 stars 47 forks source link

How to get cross-references descriptions? #220

Open CarMoreno opened 3 months ago

CarMoreno commented 3 months ago

Hi there! I'm currently trying to retrieve cross-reference descriptions from the ChEBI Ontology compounds. For example:

<owl:Class rdf:about="http://purl.obolibrary.org/obo/CHEBI_4508">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/CHEBI_26218"/>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/CHEBI_48311"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">The potassium salt of diclofenac.</obo:IAO_0000115>
        <chebi:charge rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">0</chebi:charge>
        <chebi:formula rdf:datatype="http://www.w3.org/2001/XMLSchema#string">C14H10Cl2NO2.K</chebi:formula>
        <chebi:inchi rdf:datatype="http://www.w3.org/2001/XMLSchema#string">InChI=1S/C14H11Cl2NO2.K/c15-10-5-3-6-11(16)14(10)17-12-7-2-1-4-9(12)8-13(18)19;/h1-7,17H,8H2,(H,18,19);/q;+1/p-1</chebi:inchi>
        <chebi:inchikey rdf:datatype="http://www.w3.org/2001/XMLSchema#string">KXZOIWWTXOCYKR-UHFFFAOYSA-M</chebi:inchikey>
        <chebi:mass rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">334.243</chebi:mass>
        <chebi:monoisotopicmass rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">332.97257</chebi:monoisotopicmass>
        <chebi:smiles rdf:datatype="http://www.w3.org/2001/XMLSchema#string">O=C([O-])Cc1ccccc1Nc1c(Cl)cccc1Cl.[K+]</chebi:smiles>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Beilstein:6625757</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">CAS:15307-81-0</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">DrugBank:DB00586</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">KEGG:D00903</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#string">PMID:1502708</oboInOwl:hasDbXref>
        <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">potassium {2-[(2,6-dichlorophenyl)amino]phenyl}acetate</oboInOwl:hasExactSynonym>
        <oboInOwl:hasOBONamespace rdf:datatype="http://www.w3.org/2001/XMLSchema#string">chebi_ontology</oboInOwl:hasOBONamespace>
        <oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">2-((2,6-dichlorophenyl)amino)benzeneacetic acid, monopotassium salt</oboInOwl:hasRelatedSynonym>
        <oboInOwl:hasRelatedSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Cataflam</oboInOwl:hasRelatedSynonym>
        <oboInOwl:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">CHEBI:4508</oboInOwl:id>
        <oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/chebi/3_STAR"/>
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">diclofenac potassium</rdfs:label>
    </owl:Class>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/CHEBI_4508"/>
        <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
        <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Beilstein:6625757</owl:annotatedTarget>
        <oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Beilstein</oboInOwl:source>
    </owl:Axiom>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/CHEBI_4508"/>
        <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
        <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">CAS:15307-81-0</owl:annotatedTarget>
        <oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ChemIDplus</oboInOwl:source>
    </owl:Axiom>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/CHEBI_4508"/>
        <owl:annotatedProperty rdf:resource="http://www.geneontology.org/formats/oboInOwl#hasDbXref"/>
        <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#string">PMID:1502708</owl:annotatedTarget>
        <oboInOwl:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Europe PMC</oboInOwl:source>
    </owl:Axiom>

I am expecting something like:

frozenset({
     Xref('ChemIDplus', 'CAS:15307-81-0'), 
     Xref('Europe PMC', 'PMID:1502708'), 
     Xref('Beilstein', 'Beilstein:6625757')
})

However, I am getting:

frozenset({
     Xref('CAS:15307-81-0'), 
     Xref('PMID:1502708'), 
     Xref('Beilstein:6625757')
})

It seems like Pronto might be having some trouble understanding the ontology cross-reference structure. I'm wondering if I might be doing something wrong? Could you please guide me on how to retrieve descriptions? Your assistance would be greatly appreciated! Thank you in advance for your help!

althonos commented 3 months ago

Hi @CarMoreno,

The problem here is that you are not trying to retrieve descriptions, which would be inside rdfs:label elements of each Axiom, but sources (inside oboInOwl:source elements). This is not strictly supported in OBO files (which pronto is aiming at supporting), as this can only be listed in OBO line qualifiers, which are optional and ignored by some parsers. I don't really have a solution to provide at the moment.