Parsing double and decimal numbers in turtle format can fail, even if the numbers are according to the grammar. This can happen, if the numbers have a sign but not a leading '0' before a dot, e.g. as in the following example:
<iri1> <iri2> -.2e3 .
I load it using RDFDataMgr.loadDataset(<path>) This leads to an exception.
I suspect the problem to be in TokenizerText.java, line 450, where it is only checked if a leading sign is followed by a digit, not by a dot, to decide if the following symbols are part of the token (also the comment in line 429 in the same file suggests that this is the misunderstanding).
Note: The turtle grammar in turtle.jj seems to be correct w.r.t. the shape of doubles and decimals.
Relevant output and stacktrace
Exception in thread "main" org.apache.jena.riot.RiotException: [line: 1, col: 15] Unrecognized (expected an RDF Term): [MINUS]
at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:155)
at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
at org.apache.jena.riot.lang.LangEngine.exceptionDirect(LangEngine.java:143)
at org.apache.jena.riot.lang.LangEngine.exception(LangEngine.java:137)
at org.apache.jena.riot.lang.LangTurtleBase.triplesNodeCompound(LangTurtleBase.java:489)
at org.apache.jena.riot.lang.LangTurtleBase.triplesNode(LangTurtleBase.java:469)
at org.apache.jena.riot.lang.LangTurtleBase.objectList(LangTurtleBase.java:419)
at org.apache.jena.riot.lang.LangTurtleBase.predicateObjectItem(LangTurtleBase.java:352)
at org.apache.jena.riot.lang.LangTurtleBase.predicateObjectList(LangTurtleBase.java:333)
at org.apache.jena.riot.lang.LangTurtleBase.triples(LangTurtleBase.java:314)
at org.apache.jena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBase.java:178)
at org.apache.jena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:46)
at org.apache.jena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:79)
at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43)
at org.apache.jena.riot.lang.RiotParsers$AbstractReaderRIOTLang.read(RiotParsers.java:133)
at org.apache.jena.riot.lang.RiotParsers$AbstractReaderRIOTLang.read(RiotParsers.java:91)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:452)
at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:421)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:383)
at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:547)
at org.apache.jena.riot.RDFDataMgr.parseFromURI(RDFDataMgr.java:564)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:429)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:406)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:386)
at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:377)
at org.apache.jena.riot.RDFDataMgr.loadDataset(RDFDataMgr.java:337)
Version
5.1.0
What happened?
Parsing double and decimal numbers in turtle format can fail, even if the numbers are according to the grammar. This can happen, if the numbers have a sign but not a leading '0' before a dot, e.g. as in the following example:
I load it using
RDFDataMgr.loadDataset(<path>)
This leads to an exception.I suspect the problem to be in TokenizerText.java, line 450, where it is only checked if a leading sign is followed by a digit, not by a dot, to decide if the following symbols are part of the token (also the comment in line 429 in the same file suggests that this is the misunderstanding).
Note: The turtle grammar in turtle.jj seems to be correct w.r.t. the shape of doubles and decimals.
Relevant output and stacktrace
Are you interested in making a pull request?
None