apache / jena

Apache Jena
https://jena.apache.org/
Apache License 2.0
1.1k stars 647 forks source link

Parsing of double and decimal fails #2715

Closed tobiaswjohn closed 1 week ago

tobiaswjohn commented 2 weeks ago

Version

5.1.0

What happened?

Parsing double and decimal numbers in turtle format can fail, even if the numbers are according to the grammar. This can happen, if the numbers have a sign but not a leading '0' before a dot, e.g. as in the following example:

<iri1> <iri2> -.2e3 .

I load it using RDFDataMgr.loadDataset(<path>) This leads to an exception.

I suspect the problem to be in TokenizerText.java, line 450, where it is only checked if a leading sign is followed by a digit, not by a dot, to decide if the following symbols are part of the token (also the comment in line 429 in the same file suggests that this is the misunderstanding).

Note: The turtle grammar in turtle.jj seems to be correct w.r.t. the shape of doubles and decimals.

Relevant output and stacktrace

Exception in thread "main" org.apache.jena.riot.RiotException: [line: 1, col: 15] Unrecognized (expected an RDF Term): [MINUS]
    at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:155)
    at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
    at org.apache.jena.riot.lang.LangEngine.exceptionDirect(LangEngine.java:143)
    at org.apache.jena.riot.lang.LangEngine.exception(LangEngine.java:137)
    at org.apache.jena.riot.lang.LangTurtleBase.triplesNodeCompound(LangTurtleBase.java:489)
    at org.apache.jena.riot.lang.LangTurtleBase.triplesNode(LangTurtleBase.java:469)
    at org.apache.jena.riot.lang.LangTurtleBase.objectList(LangTurtleBase.java:419)
    at org.apache.jena.riot.lang.LangTurtleBase.predicateObjectItem(LangTurtleBase.java:352)
    at org.apache.jena.riot.lang.LangTurtleBase.predicateObjectList(LangTurtleBase.java:333)
    at org.apache.jena.riot.lang.LangTurtleBase.triples(LangTurtleBase.java:314)
    at org.apache.jena.riot.lang.LangTurtleBase.triplesSameSubject(LangTurtleBase.java:178)
    at org.apache.jena.riot.lang.LangTurtle.oneTopLevelElement(LangTurtle.java:46)
    at org.apache.jena.riot.lang.LangTurtleBase.runParser(LangTurtleBase.java:79)
    at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43)
    at org.apache.jena.riot.lang.RiotParsers$AbstractReaderRIOTLang.read(RiotParsers.java:133)
    at org.apache.jena.riot.lang.RiotParsers$AbstractReaderRIOTLang.read(RiotParsers.java:91)
    at org.apache.jena.riot.RDFParser.read(RDFParser.java:452)
    at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:421)
    at org.apache.jena.riot.RDFParser.parse(RDFParser.java:383)
    at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:547)
    at org.apache.jena.riot.RDFDataMgr.parseFromURI(RDFDataMgr.java:564)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:429)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:406)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:386)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:377)
    at org.apache.jena.riot.RDFDataMgr.loadDataset(RDFDataMgr.java:337)

Are you interested in making a pull request?

None

afs commented 2 weeks ago

@tobiaswjohn - thank you for the report and analysis. I have reproduced the problem.