biopragmatics / obo-db-ingest

🗄️ Conversion of biomedical nomenclatures like HGNC to OBO
https://biopragmatics.github.io/obo-db-ingest/
6 stars 1 forks source link

drugbank obo file fails to parse using owlapi #1

Closed cmungall closed 2 years ago

cmungall commented 3 years ago

the owlapi parser is the canonical reference parser, if there are ambiguities in the spec then it's considered the decider

drugbank.obo fails"

Parser: org.semanticweb.owlapi.oboformat.OBOFormatOWLAPIParser@3e5499cc
    Stack trace:
LINENO: 111 - Could not find tag separator ':' in line.
LINE: Leuprolide was first approved in 1985 as a daily subcutaneous injection under the tradename Lupron™ by Abbvie Endocrine Inc.[L13850] Since this initial approval, various long-acting intramuscular and su
bcutaneous products have been developed such that patients can be dosed once every six months.[L13781, L13790] Leuprolide remains frontline therapy in all conditions for which it is indicated for use." []    
    org.semanticweb.owlapi.oboformat.OBOFormatOWLAPIParser.parse(OBOFormatOWLAPIParser.java:60)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:220)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.actualParse(OWLOntologyManagerImpl.java:1254)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1208)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1108)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1064)
        owltools.io.ParserWrapper.parseOWL(ParserWrapper.java:163)   
        owltools.io.ParserWrapper.parseOWL(ParserWrapper.java:150)   
        owltools.io.ParserWrapper.parseOBO(ParserWrapper.java:136)   
        owltools.cli.CommandRunner.runSingleIteration(CommandRunner.java:4801)
LINENO: 111 - Could not find tag separator ':' in line.
LINE: Leuprolide was first approved in 1985 as a daily subcutaneous injection under the tradename Lupron™ by Abbvie Endocrine Inc.[L13850] Since this initial approval, various long-acting intramuscular and su
bcutaneous products have been developed such that patients can be dosed once every six months.[L13781, L13790] Leuprolide remains frontline therapy in all conditions for which it is indicated for use." []    
    org.obolibrary.oboformat.parser.OBOFormatParser.error(OBOFormatParser.java:1465)
        org.obolibrary.oboformat.parser.OBOFormatParser.getParseTag(OBOFormatParser.java:861)
        org.obolibrary.oboformat.parser.OBOFormatParser.parseTermFrameClause(OBOFormatParser.java:610)
        org.obolibrary.oboformat.parser.OBOFormatParser.parseTermFrameClauseEOL(OBOFormatParser.java:598)
        org.obolibrary.oboformat.parser.OBOFormatParser.parseTermFrame(OBOFormatParser.java:572)
        org.obolibrary.oboformat.parser.OBOFormatParser.parseEntityFrame(OBOFormatParser.java:539)
        org.obolibrary.oboformat.parser.OBOFormatParser.parseOBODoc(OBOFormatParser.java:349)
        org.obolibrary.oboformat.parser.OBOFormatParser.parse(OBOFormatParser.java:307)
        org.obolibrary.oboformat.parser.OBOFormatParser.parse(OBOFormatParser.java:259)
        org.semanticweb.owlapi.oboformat.OBOFormatOWLAPIParser.parse(OBOFormatOWLAPIParser.java:76)

I think the newlines need to be escaped

cthoyt commented 3 years ago

Still need to look into this, including making OWL exports as well

cthoyt commented 2 years ago

Interesting you mention that OWLAPI is the reference parser... the rules for when characters need to be escaped and in which fields are pretty ill-defined. I had to do several different permutations but I think I got it