drdozer / oboformat

Automatically exported from code.google.com/p/oboformat
0 stars 0 forks source link

OBO format parser has problems with a comma in xref #54

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Loading 
http://obo.cvs.sourceforge.net/viewvc/obo/obo/ontology/chemical/chebi.obo 
throws an exception for a missing newline.

The error seems to be an unexpected comma in an xref:
xref: Wikipedia:1,4-Butanediol "Wikipedia"

Is this a bug or a problem coming from missing escape characters?
As this is in chebi, the source will be rather hard to fix.

I have added an test case for this bug: ChebiXRefTest

Original issue reported on code.google.com by HDie...@lbl.gov on 26 Sep 2011 at 11:08

GoogleCodeExporter commented 9 years ago
Currently the spec does not prohibit this. However, allowing commas in xrefs is 
problematic as commas are used as separators in XrefLists.

The current OE behavior is to accept unescaped commas for the xref tag, and to 
escape these by default when outputting. However, when you don't escape commans 
in XrefLists you get nonsense. E.g.

def: "foo" [X:1,2,3, X:1,2,4]

is translated to

def: "foo" [:2, :3, :4, X:1]

which is not unreasonable (garbage in, garbage out).

The simplest thing that is compatible with existing behavior is to allow commas 
in solitary xrefs, but to disallow them in a XrefList.

I have updated the spec. The XrefLists section now distinguishes between xrefs 
occurring in solitary and list contexts. I also added 8.1.4 in the informative 
recommendations section, stating that commas should (not must) be escaped in 
the solitary context.

Next step: change the parser to allow commas in solitary context.

Original comment by cmung...@gmail.com on 27 Sep 2011 at 1:08

GoogleCodeExporter commented 9 years ago
Hmmm...  Seems hacky to me. Too much loosening the spec to accommodate users - 
but actually probably hurting them as for any single xref - there's a 
reasonable likelihood that someone will want to add more.   Wouldn't it be 
better to enforce use of escape character in these cases. That or specify that 
the separator is ', ' - which is what OE actually uses - so it the defacto 
standard.

Original comment by dosu...@gmail.com on 27 Sep 2011 at 8:33

GoogleCodeExporter commented 9 years ago
I have implemented the change as suggested by David (rev 301). 
Task: Change specs to reflect this behavior.

Original comment by HDie...@lbl.gov on 4 Oct 2011 at 7:16

GoogleCodeExporter commented 9 years ago
Which of the two suggestions were implemented?

(1) enforce use of escape character
(2) Make the separator ", "

I prefer (2). With (1), even after getting CHEBI to change all the archived 
versions of the ontology will be technically unparseable.

Original comment by cmung...@gmail.com on 5 Nov 2011 at 6:20

GoogleCodeExporter commented 9 years ago
The option (2) with ", " as separator is implemented.

Original comment by HDie...@lbl.gov on 17 Nov 2011 at 9:57

GoogleCodeExporter commented 9 years ago

Original comment by HDie...@lbl.gov on 12 Jan 2012 at 11:20