acoli-repo / conll-rdf

Advanced graph rewriting and LLOD publication for CoNLL and other TSV formats
25 stars 9 forks source link

Specify intended behavior on encountering Jena Riot Errors in CoNLLStreamExtractor #59

Open leogott opened 3 years ago

leogott commented 3 years ago

As is, with RiotException Unrecognized keyword, the Exception gets logged as Error with full traceback (which is definitely too much in every case I can think of), then the input processed at the time gets logged as Info, and the StreamExtractor resumes functionality.

Expected Behavior:

chiarcos commented 3 years ago

Sample log:

10:14:36 ERROR riot                 :: [line: 14, col: 1 ] Out of place: [UNDERSCORE]
org.apache.jena.riot.RiotException: [line: 14, col: 1 ] Out of place: [UNDERSCORE]
        at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:147)
        at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
        at org.apache.jena.riot.lang.LangEngine.exceptionDirect(LangEngine.java:143)
        at org.apache.jena.riot.lang.LangEngine.exception(LangEngine.java:137)
        at org.apache.jena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBase.java:239)
        at org.apache.jena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:46)
        at org.apache.jena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:91)
        at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:41)
        at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:206)
        at org.apache.jena.riot.RDFParser.read(RDFParser.java:338)
        at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:324)
        at org.apache.jena.riot.RDFParser.parse(RDFParser.java:273)
        at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:498)
        at org.apache.jena.riot.RDFDataMgr.parseFromReader(RDFDataMgr.java:880)
        at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:298)
        at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:283)
        at org.apache.jena.riot.adapters.RDFReaderRIOT.read(RDFReaderRIOT.java:62)
        at org.apache.jena.rdf.model.impl.ModelCom.read(ModelCom.java:298)
        at org.acoli.conll.rdf.Format2RDF.conll2model(Format2RDF.java:235)
        at org.acoli.conll.rdf.CoNLL2RDF.conll2model(CoNLL2RDF.java:39)
        at org.acoli.conll.rdf.CoNLLStreamExtractor.processSentenceStream(CoNLLStreamExtractor.java:107)
        at org.acoli.conll.rdf.CoNLLStreamExtractor.main(CoNLLStreamExtractor.java:341)
10:14:36 INFO  Format2RDF           :: while processing the following input:
<code>
PREFIX nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> ... </code>

Suggestion: Introduce a -debug flag that enables (shortened) traceback, but skip it by default (keep the error msg -- BTW; this is partially recoverable only, the data is lost).