apache / jena

Apache Jena
https://jena.apache.org/
Apache License 2.0
1.09k stars 646 forks source link

SPARQL query round-trip serialization error #2585

Closed ajtucker closed 2 weeks ago

ajtucker commented 1 month ago

Version

5.0.0

What happened?

We're building a SPARQL query programatically and then serializing the query to a string to be able to run it against a Fuseki SPARQL endpoint. In one case, we build a query with a valid URI node that is then serialized as an invalid prefixed name, because the local part starts with an invalid character according to https://www.w3.org/TR/sparql11-query/#rPN_LOCAL

This query exhibits the issue:

PREFIX eg: <http://example.com/with-trailing/>

ASK { <http://example.com/with-trailing/-with_leading-> a eg:BadPN_LOCAL . }

Using qparse to parse the above throws a QueryParseException as below when it checks the output query by re-parsing it.

Relevant output and stacktrace

$ qparse 'PREFIX eg: <http://example.com/with-trailing/> ASK { <http://example.com/with-trailing/-with_leading-> a eg:BadPN_LOCAL . }'
PREFIX  eg:   <http://example.com/with-trailing/>

ASK
WHERE
  { eg:-with_leading-
              a  eg:BadPN_LOCAL
  }

**** Check failure: could not parse output query
org.apache.jena.query.QueryParseException: Encountered " "-" "- "" at line 5, column 8.
Was expecting one of:
    <IRIref> ...
    <PNAME_NS> ...
    <PNAME_LN> ...
    <VAR1> ...
    <VAR2> ...
    "a" ...
    "distinct" ...
    "multi" ...
    "shortest" ...
    "(" ...
    "!" ...
    "^" ...

        at org.apache.jena.sparql.lang.ParserARQ.perform(ParserARQ.java:99)
        at org.apache.jena.sparql.lang.ParserARQ.parse$(ParserARQ.java:52)
        at org.apache.jena.sparql.lang.SPARQLParser.parse(SPARQLParser.java:33)
        at org.apache.jena.query.QueryFactory.parse(QueryFactory.java:144)
        at org.apache.jena.query.QueryFactory.create(QueryFactory.java:83)
        at org.apache.jena.sparql.util.QueryUtils.checkParse(QueryUtils.java:96)
        at org.apache.jena.sparql.util.QueryUtils.checkQuery(QueryUtils.java:38)
        at arq.qparse.exec(qparse.java:158)
        at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:87)
        at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:56)
        at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:43)
        at arq.qparse.main(qparse.java:66)
PREFIX  eg:   <http://example.com/with-trailing/>

ASK
WHERE
  { eg:-with_leading-
              a  eg:BadPN_LOCAL
  }

Are you interested in making a pull request?

Yes

ajtucker commented 1 month ago

It looks as though checkValidLocalname() doesn't quite match the production for PN_LOCAL, in that the first character of the local name is restricted to:

PN_CHARS_U | ':' | [0-9] | PLX

while the rest are allowed to be:

(PN_CHARS | '.' | ':' | PLX)* (PN_CHARS | ':' | PLX)

Adding the following test fails:

diff --git a/jena-arq/src/test/java/org/apache/jena/sparql/syntax/TestSerialization.java b/jena-arq/src/test/java/org/apache/jena/sparql/syntax/TestSerialization.java
index 08c3fd708f..6efb1eaef4 100644
--- a/jena-arq/src/test/java/org/apache/jena/sparql/syntax/TestSerialization.java
+++ b/jena-arq/src/test/java/org/apache/jena/sparql/syntax/TestSerialization.java
@@ -118,6 +118,9 @@ public class TestSerialization
     @Test public void test_PName_Bad_7()
     { fmtURI_Prefix("http://example/x.", "<http://example/x.>", pmap1) ; }

+    @Test public void test_PName_Bad_8()
+    { fmtURI_Prefix("http://default/-x", "<http://default/-x>", pmap1) ; }
+
     // Dots
     @Test public void test_Dots_1() // Internal DOT
     { fmtURI_Prefix("http://example/x#a.b", "ex:a.b", pmap1) ; }