Parsing RNA Ontology - Githubissues

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

1. Attempting to convert the RNA Ontology (RNAO) downloaded from 
http://www.obofoundry.org/ to OWL using oboformat v1.4
2. Using the latest release from SVN as of today
3. OBOFormatParser return a null OBODoc from the parse() method

What is the expected output? What do you see instead?

Starts to parse RNAO with the following warning, then gives up and returns a 
null OBODoc.

2011-07-07 12:43:22,608 WARN  
org.obolibrary.oboformat.parser.OBOFormatParser.parseTermFrameClauseEOL(OBOForma
tParser.java:478)  - problem parsing tag:def: "A family_1_base_pair is an 
RNAO_base_pair in which the two RNA nucleotides interact by edge-to-edge 
hydrogen-bonding between the Watson-Crick edges of each base and in which the 
glycosidic bonds of the two nucleotides are oriented "cis" relative to each 
other." [RNAO:ROC]//238 LINE:377
2011-07-07 12:43:32,311 WARN  
org.obolibrary.oboformat.parser.OBOFormatParser.parseOBODoc(OBOFormatParser.java
:270)  - UNPARSED:def: "A family_1_base_pair is an RNAO_base_pair in which the 
two RNA nucleotides interact by edge-to-edge hydrogen-bonding between the 
Watson-Crick edges of each base and in which the glycosidic bonds of the two 
nucleotides are oriented "cis" relative to each other." [RNAO:ROC]//238 LINE:377

I would except an Exception to be thrown if it fails to parse an OBO document.

What version of the product are you using? On what operating system?

I'm using the latest 1.4 check out via SVN into Eclipse.  Using Windows XP, JVM 
1.6

Please provide any additional information below.
I did test using the native OWLAPI OBO parser on this file and seemed to handle 
it OK.

Original issue reported on code.google.com by cszamu...@gmail.com on 7 Jul 2011 at 8:01

GoogleCodeExporter commented 9 years ago

This is a problem in the source ontology; in the following line:

def: "A family_1_base_pair is an RNAO_base_pair in which the two RNA 
nucleotides interact by edge-to-edge hydrogen-bonding between the Watson-Crick 
edges of each base and in which the glycosidic bonds of the two nucleotides are 
oriented "cis" relative to each other." [RNAO:ROC]

the quoted "cis" should be escaped. E.g. \"cis\"

I will contact Colin to update his xslt that generates this (or switch straight 
to using this code)

We should also make the errors more informative. It shouldn't produce a null 
obodoc - it shouldn't try to recover from a parse exception

Original comment by cmung...@gmail.com on 7 Jul 2011 at 11:32

GoogleCodeExporter commented 9 years ago

Thanks for your quick response. I can perform some formatting cleanup before 
parsing in the meantime. I'm wondering if perhaps I am misinterpreting the use 
of this library. I am using oboformat via the Java API to parse the obo 
ontologies at the OBO foundry (I had been using the OWLAPI parser but your new 
parser seems to capture additional semantics in the OBO format).  A significant 
number of these are failing for one reason or another.  Is the library intended 
to focus on the newer 1.4 OBO format and not the 1.2 which is what most of the 
OBO ontologies in the foundry are formatted as? Thanks for any 
feedback/suggestions.

Original comment by cszamu...@gmail.com on 10 Jul 2011 at 6:27

GoogleCodeExporter commented 9 years ago

Every obof1.2 file is a valid obof1.4 file, so you should be able to use this 
library for any obo file you can download from obofoundry.org

Having said that, I would not use OBODoc objects directly. The main purpose of 
this library is to translate into OWL (or back to OBO). It isn't really 
optimized for the kinds of operations you'd want to do in an application. The 
OBODoc model rigidly mirrors obo-format, including all the legacy flaws of this 
format.

You should be able to use the OWLAPI over the converted ontology. I can see the 
temptation to use the OBODoc objects - it is much simpler to do "obo-type" 
things such as finding all the synonyms together with their scope. This is 
possible with the OWLAPI but is a lot more verbose due to the generic nature of 
OWL.

If this is the case then I would recommend

   http://code.google.com/p/owltools

This provides a convenience layer on top of the OWL API for "obo-type" 
operations. It also bundles the org.oboformat parser.

This may seem somewhat circuitous but it is the most robust strategy in the 
long term

Original comment by cmung...@gmail.com on 10 Jul 2011 at 8:02

GoogleCodeExporter commented 9 years ago

Hi again,
I'm actually only interested in converting OBO to OWL.  Once I have the OWL 
file, then I can do other stuff with it (like generate a SKOS vocabulary).  So, 
from your message it would seem that I am back to the question of handling 
failed OBO files.  Would you expect that OBOParser() would successfully parse 
all of the OBO ontologies available at the foundry? As I mentioned above, in my 
hands many fail the parsing step.

Thanks.

Original comment by cszamu...@gmail.com on 13 Jul 2011 at 7:54

GoogleCodeExporter commented 9 years ago

> Once I have the OWL file, then I can do other stuff with it (like generate a 
SKOS vocabulary).

A shame to throw away all the semantics to use SKOS

> So, from your message it would seem that I am back to the question of 
handling failed OBO files. Would you expect that OBOParser() would successfully 
parse all of the OBO ontologies available at the foundry? As I mentioned above, 
in my hands many fail the parsing step.

There's lots of active development going on right now to fix remaining 
conversion issues (mainly around ID conversion). Conversion should work for all 
Foundry ontologies soon, hopefully within the next two weeks.

Original comment by dosu...@gmail.com on 13 Jul 2011 at 8:11

GoogleCodeExporter commented 9 years ago

There are still a few ontologies that do not parse. This is due to actual 
syntax errors in the files. The maintainers have been notified.

Original comment by cmung...@gmail.com on 13 Jul 2011 at 8:17

Changed state: Done

drdozer / oboformat

Parsing RNA Ontology #43