fastobo / fastobo-py

Faultless AST for Open Biomedical Ontologies in Python.
http://fastobo.readthedocs.io
MIT License
24 stars 4 forks source link

Exception "expected QuotedString/XrefList" when loading .obo files #202

Closed mikkonie closed 7 months ago

mikkonie commented 3 years ago

Hello,

First of all, thanks for this very useful library. I've been running into a similar problem parsing some publicly available .obo files, where parsing crashes with the exception SyntaxError: expected QuotedString or a similar variation.

An example is the most recent version of the Cell Ontology (CL). I try to load it as follows:

from urllib.request import urlopen
fastobo.load(urlopen('http://purl.obolibrary.org/obo/cl.obo')) 

This results in:

  File "<stdin>", line 31344
    xref: xref (ILX:0770149)␊
               ^
SyntaxError: expected EOL or QuotedString

Similar errors occur when attempting to parse e.g. the following ontologies:

I'm not super familiar with the .obo format so it may be these files break the standard somehow. Still, it would be great to e.g. have an option to skip the unrecognized data without crashing. Also, if I'm doing something wrong myself, please let me know :)

I'm running the latest fastobo release (0.9.3) on Python 3.6 and Ubuntu Linux. If you need any further information, please let me know.

althonos commented 3 years ago

Hi @mikkonie ,

The issue is indeed that these files are not valid syntax. fastobo was made out of frustration that there was no validating parser for the OBO format, which is supposedly standardized in the OBO format version 1.4 syntax and semantics guide. I have spent a fair bit of my time trying to work that out.

As such, I think it is much better practice now to report to the ontology editors. The more people start getting concerned about this, the more we can make the OBO ecosystem a nicer place. Some of these issues have been reported already (obophenotype/uberon#1615, obophenotype/cell-ontology#931).

mikkonie commented 3 years ago

Thank you for the comment @althonos. I suspected this might be the case, but didn't think of checking the repos of the ontologies themselves for related issues. Apologies for raising an issue which should be handled on their end.

althonos commented 3 years ago

@mikkonie : No problem, you can use this issue as a reference if you end up reporting to the ORDO or PRIDE people!

althonos commented 7 months ago

Issue has been fixed upstream.