apache / jena

Apache Jena
https://jena.apache.org/
Apache License 2.0
1.12k stars 654 forks source link

QueryExecBuilder.substitution() doesn't replace references in aggregation expressions #2650

Closed shawnsmith closed 3 months ago

shawnsmith commented 3 months ago

Version

5.1.0

What happened?

https://github.com/apache/jena/issues/2028 deprecated QueryExecutionDatasetBuilder.initialBinding() and https://github.com/apache/jena/issues/2435 deprecated it for removal. But the substitution() method isn't a perfect replacement for initialBinding(). I ran across an edge case where initialBinding() appears to work correctly but substitution() does not:

Substitute ?a <- 123 in this query:

PREFIX : <https://example.com/>
SELECT (SUM(?a + ?b) AS ?c)
WHERE {
  ?s :p ?a ;
     :q ?b .
}

and you get this, where ?a in the aggregation expression is not defined:

PREFIX  :     <https://example.com/>
SELECT  (SUM(( ?a + ?b )) AS ?c)
WHERE
  { ?s  :p  123 ;
        :q  ?b
  }

It looks like substitution() doesn't replace variable references in aggregation expressions. I don't know if that's a bug or intended. For now I'm sticking with initialBinding(), so I'm concerned that it's marked for removal.

Relevant output and stacktrace

package jena_test;

import org.apache.jena.datatypes.xsd.XSDDatatype;
import org.apache.jena.graph.Graph;
import org.apache.jena.graph.Node;
import org.apache.jena.query.Query;
import org.apache.jena.query.QueryFactory;
import org.apache.jena.riot.Lang;
import org.apache.jena.sparql.core.Var;
import org.apache.jena.sparql.engine.binding.Binding;
import org.apache.jena.sparql.engine.binding.BindingBuilder;
import org.apache.jena.sparql.exec.QueryExec;
import org.apache.jena.sparql.graph.GraphFactory;
import org.apache.jena.sparql.resultset.ResultsWriter;
import org.apache.jena.sparql.syntax.syntaxtransform.QueryTransformOps;
import org.junit.jupiter.api.Test;

import java.io.ByteArrayOutputStream;
import java.util.Map;

import static java.nio.charset.StandardCharsets.UTF_8;
import static org.apache.jena.graph.NodeFactory.createLiteralByValue;
import static org.apache.jena.graph.NodeFactory.createURI;
import static org.junit.jupiter.api.Assertions.assertEquals;

class AggregationSubstitutionTest {
    @Test
    void testSubstitution_Aggregation() {
        // Current versions of Jena strongly recommend using `QueryExecBuilder.substitution(Binding)` instead of
        // `QueryExecDatasetBuilder.initialBinding(Binding), even to the point of marking the `initialBinding()`
        // method deprecated. But the two methods aren't equivalent, as shown in the following query:

        String sparql = """
                PREFIX : <https://example.com/>
                SELECT (SUM(?a + ?b) AS ?c)
                WHERE {
                  ?s :p ?a ;
                     :q ?b .
                }
                """;
        Map<Var, Node> params = Map.of(
                Var.alloc("d"), createURI("https://example.com/Bar"),
                Var.alloc("a"), createLiteralByValue(123, XSDDatatype.XSDinteger));

        // Note: QueryExecDatasetBuilder.build() implements substitutionMap using QueryTransformOps.transform().
        Query transformed = QueryTransformOps.transform(QueryFactory.create(sparql), params);

        // Note: substitution results in ?a being undefined in the SUM expression!
        assertEquals("""
                PREFIX  :     <https://example.com/>
                SELECT  (SUM(( ?a + ?b )) AS ?c)
                WHERE
                  { ?s  :p  123 ;
                        :q  ?b
                  }
                """, transformed.toString());
        QueryFactory.create(transformed.toString());

        String ns = "https://example.com/";
        Graph graph = GraphFactory.createDefaultGraph();
        graph.add(createURI(ns + "Bar"), createURI(ns + "p"), createLiteralByValue(123, XSDDatatype.XSDinteger));
        graph.add(createURI(ns + "Bar"), createURI(ns + "q"), createLiteralByValue(456, XSDDatatype.XSDinteger));

        run(sparql, graph, params);
    }

    private static void run(String sparql, Graph graph, Map<Var, Node> params) {
        BindingBuilder builder = Binding.builder();
        params.forEach(builder::add);
        Binding binding = builder.build();

        // Use substitution()
        ByteArrayOutputStream buf1 = new ByteArrayOutputStream();
        try (QueryExec qe1 = QueryExec.graph(graph).query(sparql).substitution(binding).build()) {
            ResultsWriter.create().lang(Lang.CSV).write(buf1, qe1.select());
        }
        String csv1 = buf1.toString(UTF_8);
        System.out.println("substitution:\n" + csv1.trim());

        // Use initialBinding()
        ByteArrayOutputStream buf2 = new ByteArrayOutputStream();
        try (QueryExec qe2 = QueryExec.newBuilder().graph(graph).query(sparql).initialBinding(binding).build()) {
            ResultsWriter.create().lang(Lang.CSV).write(buf2, qe2.select());
        }
        String csv2 = buf2.toString(UTF_8);
        System.out.println("initialBinding:\n" + csv2.trim());

        assertEquals(csv1, csv2, "Query results should be the same");
    }
}

Are you interested in making a pull request?

None

rvesse commented 3 months ago

initialBinding() is deprecated because it has far more inconsistent behaviour. It relies upon internal details of Jena's query engine so behaviour has changed )often subtly) over time as internals have changed. Also injecting parameters in this way means they don't have any effect of optimisation so could get worse performance.

Another alternative to substitution is to use parameterised strings which relies on textual replacement so may be another option

Aklakan commented 3 months ago

Funny that just today I stumbled over the same issue: In ExprTransformNodeElement I had added the following as a fix in my own copy of the class which just today I wanted to remove from my code base :smile:

    @Override
    public Expr transform(ExprAggregator eAgg) {
        return eAgg.applyNodeTransform(nodeTransform);
    }

I don't know if that's a bug or intended.

Me neither, that's why I was living with my own copy so far.

afs commented 3 months ago

@shawnsmith - Thank you for the detailed report.

It's a bug.

@Aklakan's fix should do it - QueryTransformOps.transform does not take scoping into account.

shawnsmith commented 3 months ago

Thanks for the quick feedback!

Fwiw I tested this change today, and it's not complete:

@Override
public Expr transform(ExprAggregator eAgg) {
    return eAgg.applyNodeTransform(nodeTransform);
}

The aggregators in the Query.aggregators field also need to be transformed, but aren't. Something along these lines:

diff --git a/jena-arq/src/main/java/org/apache/jena/sparql/syntax/syntaxtransform/QueryTransformOps.java b/jena-arq/src/main/java/org/apache/jena/sparql/syntax/syntaxtransform/QueryTransformOps.java
index b461a0a7e0..8d54370290 100644
--- a/jena-arq/src/main/java/org/apache/jena/sparql/syntax/syntaxtransform/QueryTransformOps.java
+++ b/jena-arq/src/main/java/org/apache/jena/sparql/syntax/syntaxtransform/QueryTransformOps.java
@@ -37,6 +37,7 @@ import org.apache.jena.shared.impl.PrefixMappingImpl;
 import org.apache.jena.sparql.ARQException;
 import org.apache.jena.sparql.core.*;
 import org.apache.jena.sparql.expr.Expr;
+import org.apache.jena.sparql.expr.ExprAggregator;
 import org.apache.jena.sparql.expr.ExprTransform;
 import org.apache.jena.sparql.expr.ExprTransformer;
 import org.apache.jena.sparql.expr.ExprVar;
@@ -78,6 +79,7 @@ public class QueryTransformOps {
         if (q2.getOrderBy() != null)
             mutateSortConditions(q2.getOrderBy(), exprTransform);
         mutateQueryPattern(q2, transform, exprTransform);
+        transformAggregators(query, q2, exprTransform);
         if ( query.isQueryResultStar() ) {
             // Reset internal to only what now can be seen.
             q2.resetResultVars();
@@ -111,6 +113,12 @@ public class QueryTransformOps {
         }
     }

+    private static void transformAggregators(Query query, Query newQuery, ExprTransform exprTransform) {
+        for (ExprAggregator aggregator : query.getAggregators()) {
+            newQuery.getAggregators().add((ExprAggregator) exprTransform.transform(aggregator));
+        }
+    }
+
     // Do the result form part of the cloned query.
     private static void mutateByQueryType(Query q2, ElementTransform transform, ExprTransform exprTransform) {
         switch(q2.queryType()) {
@@ -300,9 +308,6 @@ public class QueryTransformOps {
                 for (String x : desc.getNamedGraphURIs())
                     newQuery.addNamedGraphURI(x);
             }
-
-            // Aggregators.
-            newQuery.getAggregators().addAll(query.getAggregators());
         }

         @Override
afs commented 3 months ago

@shawnsmith - thanks - I've added that to the PR.