geneontology / helpdesk

The Gene Ontology Helpdesk
http://help.geneontology.org
16 stars 6 forks source link

OBO 1.4 grammar, id and xref_list #82

Closed gvinterhalter closed 6 years ago

gvinterhalter commented 6 years ago

I'm working on a LALR1 parser for obo files and I noticed that definition for GO:0150005 (and others) has href list: [GOC:bhm, IntAct: EBI-16417801, ... There should be no white space after "intAct:" right? (or at least it should be escaped? )

Also I wanted to ask if someone can clarify some rules for xref_list and qualifiers list. The grammar never tries to limit characters like: ',' ']' '}' and '=' for IDs. So does that mean that there should be no lexer? I currently have rules to limit these characters when tokenising input with lexer but it's a strange mix. For example I don't limit '=' for URLs and prefixed_id's for example because that doesn't work.

kltm commented 6 years ago

@cmungall This might be best answered from your side?

cmungall commented 6 years ago

I'm working on a LALR1 parser for obo

Do you absolutely need to support obo? https://douroucouli.wordpress.com/2016/10/04/a-developer-friendly-json-exchange-format-for-ontologies/

There should be no white space after "intAct:" right? (

Correct.

this is a common class of error. Some parsers may autorepair

The grammar never tries to limit characters like: ',' ']' '}' and '=' for IDs.

best place for this is on the tracker here https://github.com/owlcollab/oboformat

but will likely not get much action I'm afraid, will accept PRs for the spec if they conform to obo files in the wild and what the owlapi obo parser does

but I would target energies away from obo