arrdem / shelving

A toolkit for building data stores.
Eclipse Public License 1.0
38 stars 2 forks source link

Implement relations #3

Closed arrdem closed 6 years ago

arrdem commented 6 years ago

Relations describe how shelves can be decomposed to substructures according to their specs, and how they should be indexed according to their substructures.

This requires reworking the original API which treated the entire shelving unit as a multi-table k/v store and recognizing that there are two kinds of access patterns we want to support - "records" and "values". Most of my design work is oriented around values in the traditional Clojure sese. Shelving is really designed to be a value store, not a record store in the traditional sense of a mutable database. Adding relations brings this contention to a head by demanding that I figure out the mechanics for building and maintaining relations.

The core of the issue is that it's possible to write to a "value" shelf just like yoou'd write to any other shelf. Upserts are the particular issue, in that they invite violations of the "value is identity" property, and can lead to transitive invalidation of other value shelves.

Solitions to this problem are:

  1. To position shelving as a value store, not a record store. This means dropping upserts as a feature.
  2. To define a distinction between "values" and "records", where "values" do not support upserts (or deletion in the future) and are identified by content hash while "records" do not have repeatable IDs, can be upserted (or maybe deleted) and cannot occur as targets of rels.

This changeset begins to implement the second approach, thus enabling relations on values and preserving the possibility of relations on records at the cost of some change to the basic APIs to differentiate between record shelves and value shelves explicitly.

Fixes #2 without loss of generality AFAIK

arrdem commented 6 years ago

So after some quality time thinking about this I came up with a strategy to make relations work in the context both of "values" and of "records".

I broke the notion of a schema apart into "value" specs and "record" specs. "value" specs always use content hashing to determine their ID, and are thus always deduplicated. "record" specs have place semantics, and only use random IDs. It's illegal to upsert a "value" spec, but it is legal to do so against a "record" spec.

If you think about indices in the context of mutation, the traditional ORM model for nested mutation is possible only because ORMs can maintain a notion of the place of an update so that updates can be translated back to their database locations. There's really not a good way to do that without introducing some massively stateful Associative wrapper type which would be far from idiomatic. Consequently the only really reasonable thing to say about "record" types in the context of relations between specs is that there is no support for "on delete cascade", and upserts always cause non-value substructures to be re-inserted. It sucks but I don't seen an obvious, simple alternative. "records" may relate to "values" for which there's no meaningful delete/retraction operation, and there's no real way to detect what changed (maybe clojure.data/diff?) so we just accept leaving un-reached old data around for simplicity's sake.

rels are constrained to provide one of record -> record, record -> val or val -> val. That is, a "record" spec may have other records as substructures, but once you get into the category of "values", you're stuck there. "values" cannot relate to "records" which could be upserted at any time.


This changeset includes a working implementation of relations in the trivial EDN storage layer, which I have some confidence in. I could use some feedback on the relation access API, and I'd like to put better testing of the relation system together. The current example test case isn't super great.

Also packaged in this patch is a nearly complete overhaul of my docstrings and the /docs/ tree, leveraging a quick little splatter program to render good-enough docs from manual ^:category annotations and the usual Clojure doc source information.

As-is, the multiple backends capabilities should provide a sufficient layer of indirection to put together incremental read/write functionality so I'm inclined to merge this or something not too far from it and kinda get the balll rolling on that unless there are clear architectural flaws here.

You can peruse the entire changeset including updated docs here on its branch.

TODO

arrdem commented 6 years ago

After automating the generation of the /docs/ tree and workshopping the API names a bit, I think this is ready to go. Gonna stick with the multimethod API for now, and the shared storage layer is a back-end specific optimization I'll follow up on.

🚀 !