fluree / db

Fluree database library
https://fluree.github.io/db/
Other
333 stars 21 forks source link

Generalize Filter Clauses #746

Closed zonotope closed 3 months ago

zonotope commented 3 months ago

This patch generalizes filter clauses by removing the constraint that they only reference a single variable defined in the where clause and removes the optimization of applying prospective filters defined in filter clause only in index range scans directly. Removing that optimizations allows us to apply filters to variables not bound by range scans, like from a bind clause. It also allows us to filter for the absences of a variable, like from an optional clause.

This patch also adds a new "variable map" syntax where single-variable filters can be defined where the index range scan optimization will be applied. This allows users to still have an option of directing the query engine to apply the index range scan optimization where possible, while still being able to define more general filters referencing multiple variables or variables not bound in an index range scan.

I also added a feature that allows us to use variables as language tags in queries as well.

bplatz commented 3 months ago

@zonotope I'm still having issues making this work for my primary use case of this feature. Here is a reproducible case that should be able to be pasted directly into your REPL:

(let [conn   @(fluree/connect-memory nil)
      ledger @(fluree/create conn "test/db1")
      db     @(fluree/stage
                (fluree/db ledger)
                {"@context" {"ex" "http://example.org/"}
                 "insert"   {"@id"       "ex:bob"
                             "ex:father" [{"@id" "ex:alex-jr"}, {"@id" "ex:aj"}]}})]
  @(fluree/query db {:context {"ex" "http://example.org/"}
                     :select  '[?s ?f1 ?f2]
                     :where   '[{"@id"       ?s
                                 "ex:father" ?f1}
                                {"@id"       ?s
                                 "ex:father" ?f2}
                                ;; remove below filter to see all results (should be 4 total)
                                ["filter" "(not= ?f1 ?f2)"]]})
  ;; => []   ;; but should be: [["ex:bob" "ex:alex-jr" "ex:aj"], ["ex:bob" "ex:aj" "ex:alex-jr"]]
  )
zonotope commented 3 months ago

@zonotope I'm still having issues making this work for my primary use case of this feature. Here is a reproducible case that should be able to be pasted directly into your REPL:

(let [conn   @(fluree/connect-memory nil)
      ledger @(fluree/create conn "test/db1")
      db     @(fluree/stage
                (fluree/db ledger)
                {"@context" {"ex" "http://example.org/"}
                 "insert"   {"@id"       "ex:bob"
                             "ex:father" [{"@id" "ex:alex-jr"}, {"@id" "ex:aj"}]}})]
  @(fluree/query db {:context {"ex" "http://example.org/"}
                     :select  '[?s ?f1 ?f2]
                     :where   '[{"@id"       ?s
                                 "ex:father" ?f1}
                                {"@id"       ?s
                                 "ex:father" ?f2}
                                ;; remove below filter to see all results (should be 4 total)
                                ["filter" "(not= ?f1 ?f2)"]]})
  ;; => []   ;; but should be: [["ex:bob" "ex:alex-jr" "ex:aj"], ["ex:bob" "ex:aj" "ex:alex-jr"]]
  )

@bplatz I just pushed a fix for this along with a test based on the code you provided. that code made tracking this down pretty easy, so thanks for that.

The problem was from the way we represent iris internally, which is different from scalar values to allow for federated queries to work across different ledgers where the sids might not be the same.

The filtering code was only checking for the case of scalar values and not iri references, so it always got nil when trying to extract an iri value, so not= always returned false.