commontoolsinc / synopsys

datastore service for datums
2 stars 1 forks source link

Handling concurrent writes #38

Open Gozala opened 4 weeks ago

Gozala commented 4 weeks ago

Currently implementation includes value hashes in the key space so that:

  1. We could avoid conflicting updates.
  2. We could avoid having to read before we write.

Tradeoff is that a 32 byte overhead per record.

As I had being thinking more about G-Trees & Prolly Trees I have realized that sticking values hashes into keys is probably not a great solution and specifically I think we could do far better if we instead stored sorted sets of values instead because:

  1. It would save up space and be more similar to columnar encoding
  2. We end up with a set anyway so carrying overhead makes no big difference.
  3. During sync we can simple merge sets without much fuss.

Perhaps most interesting of all however is that it could offer a good way to capture causal order implicitly. Specifically when transacting fact like fact [entity, 'title', 'Buy Milk'] we could add cause reference to last fact that asserted value on entity with 'title' attribute. If there are two concurrent assertions we could establish causal order for all of them.

You can think of it as facts with shared entity, attribute-s forming a git like commit history. We could probably even take care of sorting sets in causal order on write so we don't have to deal with that at query time.

I think retractions could also be incorporated in a similar way. We could either prefix values in the set via byte to signal whether fact is asserted or retracted or alternatively we could add a byte into key space in which case we will end up with assertion and retraction sets making active facts concurrent.

It would be nice if we could prune retracted entries somehow but that is less obvious, perhaps we could incorporate a bloom filter somehow also which could describe values that have being retracted so when we merge sets values that are absent but match the filter could be considered pruned. (bloom filters won't be ideal due to false positive nature, but I'm sure there are things that would work here).

Gozala commented 4 weeks ago

I'm sure this will also be relevant here also https://www.youtube.com/watch?v=B943F4IpLWo