EBISPOT / efo

Github repo for the Experimental Factor Ontology (EFO)
https://www.ebi.ac.uk/efo/
55 stars 12 forks source link

Pronto cannot read EFO 3.67.0 #2264

Open Zethson opened 1 month ago

Zethson commented 1 month ago

Attempting to parse EFO 3.67.0 from http://www.ebi.ac.uk/efo/releases/v3.67.0/efo.owl using Pronto, which is the defacto standard in Python, results in an error.

Note that all past versions including 3.66.0 still work.

---> 50 onto = Ontology("efo.3.67.0.owl")

File ~/PycharmProjects/bionty/bionty/base/_ontology.py:33, in Ontology.__init__(self, handle, import_depth, timeout, threads, prefix)
     31 self._prefix = prefix
     32 warnings.filterwarnings(\"ignore\", category=pronto.warnings.ProntoWarning)
---> 33 super().__init__(
     34     handle=handle, import_depth=import_depth, timeout=timeout, threads=threads
     35 )

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pronto/ontology.py:283, in Ontology.__init__(self, handle, import_depth, timeout, threads)
    281 for cls in BaseParser.__subclasses__():
    282     if cls.can_parse(typing.cast(str, self.path), buffer):
--> 283         cls(self).parse_from(_handle)  # type: ignore
    284         break
    285 else:

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pronto/parsers/rdfxml.py:117, in RdfXMLParser.parse_from(self, handle, threads)
    115     self._extract_annotation_property(prop, curies)
    116 for class_ in tree.iterfind(_NS[\"owl\"][\"Class\"]):
--> 117     self._extract_term(class_, curies)
    118 for axiom in tree.iterfind(_NS[\"owl\"][\"Axiom\"]):
    119     self._process_axiom(axiom, curies)

File ~/miniconda3/envs/lamindb/lib/python3.11/site-packages/pronto/parsers/rdfxml.py:399, in RdfXMLParser._extract_term(self, elem, curies)
    397         termdata.xrefs.add(Xref(text))
    398     else:
--> 399         termdata.xrefs.add(Xref(attrib[_NS[\"rdf\"][\"resource\"]]))
    400 except ValueError:
    401     pass

KeyError: '{http://www.w3.org/1999/02/22-rdf-syntax-ns#}resource'"
}
Zethson commented 1 week ago

Dear @jamesamcl

I'm sorry for asking you directly here and pinging, but since you assigned yourself to this issue, I was wondering whether you already had a chance to look into it.

Thank you very much!

jamesamcl commented 1 week ago

It seems Pronto is failing on an blank string in a hasDbXref property:

    <owl:Class rdf:about="http://www.ebi.ac.uk/efo/EFO_0022680">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/CL_0000010"/>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0001000"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_9606"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0001000"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/UBERON_0000178"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0001000"/>
                <owl:someValuesFrom rdf:resource="http://www.ebi.ac.uk/efo/EFO_0000324"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <obo:IAO_0000115>Human-human somatic cell hybrid cell line, established by PEG-mediated fusion of the B-lymphoblastoid cell line LCL 721.174 with an 8-azaguanine and ouabain-resistant variant of the T-LCL CEM. Subclone of the T1 cell line which has lost both CEM(R)-derived copies of chromosome 6</obo:IAO_0000115>
        <obo:IAO_0000117>Kalpana Panneerselvam</obo:IAO_0000117>
        <oboInOwl:hasDbXref> </oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref>BTO:0003771</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref>CLO:0009242</oboInOwl:hasDbXref>
        <oboInOwl:hasDbXref>RRID:CVCL_2211</oboInOwl:hasDbXref>
        <rdfs:label>T2</rdfs:label>
    </owl:Class>

Pronto should probably just ignore this and move on rather than crashing, but we can also fix it in EFO.

@zoependlington