arrdem / shelving

A toolkit for building data stores.
Eclipse Public License 1.0
38 stars 2 forks source link

Generalize recursive spec walking of rel structures #9

Closed arrdem closed 6 years ago

arrdem commented 6 years ago

At present, relations are fairly limited in that they can't really indirect through multispecs, and users must statically provide for some [from to] spec the to-fn which maps objects conforming to the from spec to (presumably sub) objects conforming to the to spec for relating.

Spec already provides appropriate machinery for performing this walk. That Shelving currently requires users to manually re-specify it is a known limitation.

This PR adds facilities for interpreting a subset of clojure.spec(.alpha), to achieve such tree walks and thus automatically "destructuring" values into their components for recursive insertion.

This enables much finer grained queries for #8, in that queries may now be specified against implicit substructures although it does require that many of the current validations on the consistency of schemas and specs be relaxed.

arrdem commented 6 years ago

A quick demo -

2018-02-11-232532_1922x638_scrot

Or the bare text -

shelving.walk> (binding [*trace-walk* true]
                 (postwalk-with-spec (fn [spec o]
                                       (println spec o)
                                       o)
                                     :grimoire/package (grimoire/->mvn-pkg "org.clojure" "clojure" "1.6.0")))
shelving.walk/walk-with-spec* DEBUG ] :grimoire/package (clojure.spec.alpha/multi-spec grimoire/package->spec :package)
shelving.walk/walk-with-spec* DEBUG ] :org.maven/package (clojure.spec.alpha/keys :req-un [:org.maven/type :org.maven/group :org.maven/artifact :org.maven/version])
shelving.walk/walk-with-spec* DEBUG ] :org.maven/type #{:org.maven/package}
:org.maven/type :org.maven/package
shelving.walk/walk-with-spec* DEBUG ] :org.maven/group clojure.core/string?
:org.maven/group org.clojure
shelving.walk/walk-with-spec* DEBUG ] :org.maven/artifact clojure.core/string?
:org.maven/artifact clojure
shelving.walk/walk-with-spec* DEBUG ] :org.maven/version clojure.core/string?
:org.maven/version 1.6.0
:org.maven/package {:type :org.maven/package, :group org.clojure, :artifact clojure, :version 1.6.0}
{:type :org.maven/package,
 :group "org.clojure",
 :artifact "clojure",
 :version "1.6.0"}
shelving.walk> 

The idea here is that walk-with-spec makes it super easy to write a data walk which computes the ID for the parent record according to its spec, recursively inserts any children structures recognized to have rels to the parent spec, then inserting the parent. This would be required behavior for put implementations which would massively generalize the utility of indices.

One question this kinda begs is that of connecting object IDs to their schemas - say using the top two or four bytes of every UUID to store a spec identifying hash. Say a spec were to be wrapped up in four layers of aliases. It would make no sense to serialize that record under its terminal spec, and to create four entries in other tables chaining through each-other to it. It'd be far better to simply insert one recorcd with one ID, and add non-nested "pointers" in the other specs to it. This would require adding some notion of a pointer, which presumably would only be supported on "value" specs.

ponderables for the morning.

arrdem commented 6 years ago

Problems this is uncovering -

  1. The spec "schema" for the data storage layer is assumed to be static and fixed at connection creation time. This creates problems when recursing through multispecs which is an essential feature.
  2. The current spec walk used for inserts is required to be structure (spec) preserving. This means that it's inappropriate to the task of generating what is fundimentally a new structure containing the IDs for recursively inserted substructures.

I don't love the idea of switching to fully automatic schemas - it cheapens the notion of the database schema, and make the intentional structures for storage more implicit and less declarative. But the only way to make multispecs work reasonably is to lazily add schemas to the spec so... the heck with it?

While I want separate substructure / sub-value storage as a property of pretty much any reasonable storage layer, I'm not convinced that I actually want to present that behavior in the API. No matter what the storage representation, the only reasonable deserialization model is eager deserialization. Data in, data out, not some weird side-effecting ORM structure.


Going forwards:

  1. Come up with a design for storing substructures by explicit reference. This means comming up with a coding scheme for references to other records in the store so they can be recursively deserialized.
  2. Add an "alter schema" operation, higher order over the altering function, which must preserve compatability invariants same as value-spec or record-spec.

Looks like there's also some issues around relation building given that those tests are now pretty consistently hosed.

arrdem commented 6 years ago

Notes for the morning:

  1. Still needs a transactions API. Probably a separate line of work.
  2. ~The "trivial" map storage layer has some significant issues around index invalidation now. May just be worth rolling back some of that code and trying again at this point, since I wrote the entire append only log implementation tonight as a replacement rather than debug it.~ These have been resolved and tested.
  3. ~The append only log doesn't seem to handle schema updates correctly.~ Resolved and tested.
  4. ~The common test suite definitely doesn't cover schema updates.~ Resolved and tested.
  5. ~A bunch of docs are now stale, and some symbols don't have any coverage at all.~ Symbol doc coverage is complete but the docs have not been reviewed for staleness. Could definitely write a new demo.
arrdem commented 6 years ago

Demo here on twitter

arrdem commented 6 years ago

I've completely overhauled the docs, and my tools for helping me write them. At this point this PR has completely outlived its relevance except as a log of my work, so I'm gonna merge it and follow up on finishing inserting only denormalized values later.