arrdem / shelving

A toolkit for building data stores.
Eclipse Public License 1.0
38 stars 2 forks source link

Implement queries #8

Closed arrdem closed 6 years ago

arrdem commented 6 years ago

A work in progress branch striving towards an implementation of queries over shelves.

Fixes #4

arrdem commented 6 years ago

Because the Grimoire v3 spec I'm deving against is heavily based on multispecs, this work really requires #9 to be merged first because without it there really isn't enough index information in my main case study to work with.

arrdem commented 6 years ago
shelving.query> (q *conn
                   {:select '{?qux ::qux}
                    :where  '[[?bar [::bar ::foo] "a"]
                              [?bar [::bar ::qux] ?qux]]})
[?qux ?bar]
[?bar [:shelving.query/bar :shelving.query/foo] a] nil -> ?q_23872
[?bar [:shelving.query/bar :shelving.query/qux] ?qux] ?q_23872 -> ?bar
([?qux     {:type :shelving.query/scan-spec,
            :spec :shelving.query/qux}]
 [?q_23872 {:type :shelving.query/scan-rel,
            :rel [:shelving.query/foo :shelving.query/bar],
            :id #uuid "0823365a-e751-57b2-abf0-967f8bdfece2"}]
 [?q_23874 {:type :shelving.query/project,
            :rel [:shelving.query/qux :shelving.query/bar],
            :left ?qux}]
 [?bar     {:type :shelving.query/intersect,
            :left ?q_23872,
            :right ?q_23874}])
shelving.query> 

IT'S ALIIIIVE

arrdem commented 6 years ago

Okay! After a full day of hacking queries are tested and work!

"quick" demo of the entire analyzer/compiler pipeline -

First lets build an example context,

query-demo> (do (s/def ::foo string?)
                (s/def ::qux pos-int?)
                (s/def :query.bar/type #{::bar})
                (s/def ::bar
                  (s/keys :req-un [::foo
                                   ::qux
                                   :query.bar/type]))

                (def schema
                  (-> sh/empty-schema
                      (sh/value-spec ::foo)
                      (sh/value-spec ::qux)
                      (sh/value-spec ::bar)
                      (sh/spec-rel [::bar ::foo] :foo)
                      (sh/spec-rel [::bar ::qux] :qux)))

                (def *conn
                  (-> (->TrivialEdnShelf schema "target/query.edn"
                                         :flush-after-write true
                                         :load true)
                      (sh/open)))

                (sh/put *conn ::bar {:type ::bar :foo "a" :qux 1})
                (sh/put *conn ::bar {:type ::bar :foo "a" :qux 2})
                (sh/put *conn ::bar {:type ::bar :foo "a" :qux 3})
                (sh/put *conn ::bar {:type ::bar :foo "b" :qux 1})
                (sh/put *conn ::bar {:type ::bar :foo "c" :qux 1})

                nil)
nil

Now lets lust lexically analyze a query

query-demo> (q****
             *conn
             {:select '{?bar ::bar}
              :where  '[[?bar [::bar ::foo] ?b]]
              :params '{?b "a"}})
{?bar {:clauses {[:query-demo/bar :query-demo/foo]
                 #{[?bar [:query-demo/bar :query-demo/foo] "a"]}},
       :spec :query-demo/bar,
       :selected? true}}

In order to figure out how to compile this query we need a topological order so that we can make sure all the query variables we need are bound when we need them.

query-demo> (q***
             *conn
             {:select '{?bar ::bar}
              :where  '[[?bar [::bar ::foo] ?b]]
              :params '{?b "a"}})
{:depmap {?bar {:clauses {[:query-demo/bar :query-demo/foo]
                          #{[?bar [:query-demo/bar :query-demo/foo] "a"]}},
                :spec :query-demo/bar,
                :selected? true}},
 :ordering [?bar]}

Cool! lets build the topological order to some primitive operations we know how to relate to the database implementations, eg index projections, scans and soforth.

query-demo> (q**
             *conn
             {:select '{?bar ::bar}
              :where  '[[?bar [::bar ::foo] ?b]]
              :params '{?b "a"}})
[[?bar
  ([?bar
    {:type :shelving.query.planner/scan-rel,
     :rel [:query-demo/foo :query-demo/bar],
     :id #uuid "0823365a-e751-57b2-abf0-967f8bdfece2"}])]]

It's a pretty simple query, all we do is use the reverse index for the content ID of the one constant we're querying. So lets almost completely compile it just so we can see what happens.

query-demo> (q*
             *conn
             {:select '{?bar ::bar}
              :where  '[[?bar [::bar ::foo] ?b]]
              :params '{?b "a"}})
;; Output has been formatted by hand
(fn [conn select-lvar-specs]
  (fn [states]
    (let [?bar (fn ?bar [states]
                 (->> states
                      (mapcat (fn [state]
                                (map (fn [e]
                                       (assoc state (quote ?bar) e))
                                     (shelving.core/relate-by-id
                                      conn [:query-demo/foo :query-demo/bar]
                                      #uuid "0823365a-e751-57b2-abf0-967f8bdfece2"))))))]
      (->> states ?bar
           ;; Select the requested lvars
           (map (fn [state__17698__auto__]
                  (select-keys state__17698__auto__ (keys select-lvar-specs))))
           ;; Look up selected IDs for their values
           (map (fn [state__17698__auto__]
                  (->> (for [[lvar__17699__auto__ spec__17700__auto__] select-lvar-specs]
                         [lvar__17699__auto__ (shelving.core/get conn spec__17700__auto__
                                                                 (get state__17698__auto__ lvar__17699__auto__))])
                       (clojure.core/into {}))))))))

The idea here is that our query implementation lives in a chain of functions from sequences of states to sequences of states, where a state is a particular binding of logic variables to shelf record IDs. We'll compile this function with eval, give it the logic variables we want to select and the connection it'll run against, then "pump" it with the initial state - [{}] the sequence of an empty state 😄

And finally, lets run it end to end!

query-demo> (q
             *conn
             {:select '{?bar ::bar}
              :where  '[[?bar [::bar ::foo] ?b]]
              :params '{?b "a"}})
({?bar {:type :query-demo/bar, :foo "a", :qux 3}}
 {?bar {:type :query-demo/bar, :foo "a", :qux 2}}
 {?bar {:type :query-demo/bar, :foo "a", :qux 1}})
query-demo> 

This query implementation also does some nice things such as supporting optimizing out un-selected, non-impacting clauses.

query-demo> (q
             *conn
             {:select '{?qux ::qux}
              :where  '[[?bar [::bar ::foo] "a"]
                        [?bar [::bar ::qux] ?qux]
                        [?not-selected [::bar ::foo] "c"]]})
WARN shelving.query.analyzer/dataflow-optimize] Optimizing out unused lvars #{?not-selected}
({?qux 1}
 {?qux 2}
 {?qux 3})
query-demo> 

Also included in this changeset is a has? predicate for the core API, and a correctness fix to the trivial EDN back-end which didn't do index invalidation on insert quite right.

I need to sit down and think hard about the way that I'm doing joins, but since this datalog subset supports only queries which constitute DAGs I think it's generally correct. Pretty stoked this works at all.

arrdem commented 6 years ago

Overhauled the query language to follow Datascript/Datomic's example. Now looks more recognizable. The query front-end now features a limited type inference engine, enabling relations to be specified only by the spec of the right-hand side as the right hand side can be inferred in some cases. Relations can still be specified fully by ID.

The aesthetic choice was made to diverge from datalog/datomic in that :find/:in/:where clauses are sequences of terms rather than bare terms in-line. This fits my taste but may be worth re-visiting. Writing a frontend shim between the two notations shouldn't be hard.

Queries which have :in parameters may now become "prepared", in that they're just waiting for n-many :in arguments. Queries with 0 :in arguments are implicitly applied so that (q conn query & args) sorta works as expected.

query-demo> (q *conn
               '[:find   [?qux]
                 :in    [?in]
                 :where [[?bar [::bar ::foo] ?in]
                         [?bar [::bar ::qux] ?qux]]])
#object[clojure.core$partial$fn__5561
        "0x536c4213"
        "clojure.core$partial$fn__5561@536c4213"]
query-demo> (def prepared-query *1)
#'query-demo/prepared-query
query-demo> (prepared-query "c")
({?qux 1})
query-demo> (prepared-query "b")
({?qux 1})
query-demo> (prepared-query "a")
({?qux 1} {?qux 2} {?qux 3})
query-demo> 
arrdem commented 6 years ago

Supporting Datascript/Datomic style inline queries turned out to be a snap with s/alt.

query-demo> (q *conn
               '[:find [[:from ::qux ?qux]]
                 :where [[?bar [::bar ::foo] "a"]
                         [?bar [::bar ::qux] ?qux]]])
({?qux 1} {?qux 2} {?qux 3})
query-demo> (q *conn
               '[:find [:from ::qux ?qux]
                 :where [?bar [::bar ::foo] "a"]
                        [?bar [::bar ::qux] ?qux]])
({?qux 1} {?qux 2} {?qux 3})
query-demo> 
arrdem commented 6 years ago

Questions for the morning:


Edit: answers from the afternoon,

Documentation improvements, including a tutorial for my flavor of "datalog" are definitely required for a release but I'm gonna punt on them for now. Want to get some of the spec round-tripping features done which will further enable this work.