althonos / pronto

A Python frontend to (Open Biomedical) Ontologies.
https://pronto.readthedocs.io
MIT License
229 stars 48 forks source link

.obo SyntaxErorrs #131

Open joelduerksen opened 3 years ago

joelduerksen commented 3 years ago

What is the best way to force pronto to continue in the face of messy/real data?

>>> cl = Ontology("http://purl.obolibrary.org/obo/uberon.obo")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/plastic/anaconda3/envs/uberon/lib/python3.8/site-packages/pronto/ontology.py", line 283, in __init__
    cls(self).parse_from(_handle)  # type: ignore
  File "/home/plastic/anaconda3/envs/uberon/lib/python3.8/site-packages/pronto/parsers/obo.py", line 45, in parse_from
    raise SyntaxError(s.args[0], location) from None
  File "http://purl.obolibrary.org/obo/uberon.obo", line 199316
    def: "A free, modified neural spine of a preural or a ural vertebra that is placed between the last developed neural spine of a preural centrum and the dorsal axis (= anterior margin of first uroneural) of the caudal skeleton . An epural commonly supports one or more dorsal procurrent rays. An epural is an unpaired median perichondrally ossified bone." [TAO:Arratia and Schultze_1992]␊
                                                                                                                                                                                                                                                                                                                                                                                    ^
SyntaxError: expected QuotedString

>>> cl = Ontology("http://purl.obolibrary.org/obo/cl.obo")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/plastic/anaconda3/envs/uberon/lib/python3.8/site-packages/pronto/ontology.py", line 283, in __init__
    cls(self).parse_from(_handle)  # type: ignore
  File "/home/plastic/anaconda3/envs/uberon/lib/python3.8/site-packages/pronto/parsers/obo.py", line 45, in parse_from
    raise SyntaxError(s.args[0], location) from None
  File "http://purl.obolibrary.org/obo/cl.obo", line 163708
    creation_date: 202z-09-29T15:45:36Z␊
                   ^
SyntaxError: expected Iso8601Year

>>> cl = Ontology("http://purl.obolibrary.org/obo/basic.obo")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/plastic/anaconda3/envs/uberon/lib/python3.8/site-packages/pronto/ontology.py", line 283, in __init__
    cls(self).parse_from(_handle)  # type: ignore
  File "/home/plastic/anaconda3/envs/uberon/lib/python3.8/site-packages/pronto/parsers/rdfxml.py", line 89, in parse_from
    raise ValueError("could not find `owl:Ontology` element")
ValueError: could not find `owl:Ontology` element
althonos commented 3 years ago

Hi @joelduerksen ,

pronto is backed by fastobo, which is a library I developed during a MSc placement in order to assess the syntactic correctness of OBO ontologies. As such, it means that pronto will only load correct ontologies, in an effort to improve the landscape of OBO products.

So, in the case of Uberon and CL, I am in contact with the ontology developers, so I can likely patch the issues. The third example you are showing is something else, because you request a non-existing file to the OBO library :smile:

althonos commented 3 years ago

UBERON issues have been reported in uberon#1850. CL issues are actually coming from UBERON imports, so fixing UBERON will likely fix CL as well.

jkanche commented 3 years ago

running into the same issue as well...