eclipse-rdf4j / rdf4j

Eclipse RDF4J: scalable RDF for Java
https://rdf4j.org/
BSD 3-Clause "New" or "Revised" License
367 stars 164 forks source link

Filter pushdown #4390

Open JervenBolleman opened 1 year ago

JervenBolleman commented 1 year ago

Problem description

Filter pushdown (in the database often called predicate pushdown but that is confusing in RDF ;) ) is way to move filter evaluation into the equivalent of the getStatement method. This can be used to drastically reduce IO and memory allocation requirements.

e.g. assume the following query.

SELECT *
WHERE {
   [] ex:length ?l .
   FILTER(?l > 2000 && ?l < 3000)
} 

If the get statements can get to the filter constraint, and if ex:length has a relevant index we could avoid generating many literal and binding sets.

I believe a first step is to expand the API. We currently have getStatements(Resource s, IRI p, Value o, Resource... gs) and I think we can add the following getStatements(Var s, Var p, Var o, Var gs). With Var being something that can be introspected by the query execution layer.

e.g. given the following query

SELECT *
WHERE {
  ?isAnResourceBecauseSubject a [] .
 [] ex:whatever   ?isAnResourceBecauseSubject 
}

and data in the shape where a objects may be either a literal or an iri.

ex:1 a ex:Class .
ex:1 a "a literal"  .
ex:1 ex:whatever ex:2 .

Then the query execution may avoid materializing or even touching the triple ex:1 a "a literal" . as we can mark up the Var ?isAnResourceBecauseSubject with an attribute/extention to mark that it must be a Resource.

Preferred solution

No response

Are you interested in contributing a solution yourself?

Yes

Alternatives you've considered

Implement this store specifically.

Anything else?

No response

JervenBolleman commented 1 year ago

An interesting presentation by Prof. Peter Boncz that touches this and other things we should have a look at.

kenwenzel commented 1 year ago

A vey interesting presentation. Thank you for mentioning this.