Optimistic concurrency support

stevehodgkiss commented 8 years ago

Implements optimistic concurrency by supporting an expected_version when saving events. If you don't care about versions it can be omitted (EventReactors etc). If you do care about concurrent changes to an aggregate it, specify the version you were at when you emit the event and it will fail if there have been anymore recent changes to that aggregate. Integrates with AggregateRoot class to do this.

Previous PR: https://github.com/envato/event_sourcery/pull/10

vonconrad commented 8 years ago

Looks good, but curious about why we need a separate table and a rather long SQL statement to insert an event. Could we not simply have a unique index across (aggregate_id, version), which would prevent duplicates? If an event reactor doesn't specify a version, you could select the current version before attempting to insert. Something like this (pseudo code, not tested):

query = -> { |version|
  events_table.
    returning(:id).
    insert(aggregate_id: event.aggregate_id,
      type: event.type.to_s,
      body: ::Sequel.pg_json(event.body)
      version: version)
}

if expected_version
  query.call(expected_version + 1)
else
  begin
    version = events_table.
      where(aggregate_id: event.aggregate_id).
      order('id DESC').
      get(:version) # automatically adds `.limit(1)` and returns only the specified column

    query.call(version + 1)
  rescue UniqueConstraintViolation
    retry
  end
end

Am I missing something obvious? :grin:

stevehodgkiss commented 8 years ago

That approach doesn't actually ensure that the expected version is what it is, just that expected + 1 is unique. I prefer the solution in this PR because it guarantees that the expected_version is what we expect before inserting (as the API implies).

stevehodgkiss commented 8 years ago

While the postgres function may be long it's really just a few if branches and 2-3 sql calls, all of which happen inside postgres in one call, in order to minimise transaction time.

vonconrad commented 8 years ago

@stevehodgkiss I understand that reasoning, but I'm not sure I agree those semantics are worth the extra table. In particular, I'm concerned that moving from having a single table as the source of truth for the system to having two tables (one of which is mutable) has unforeseen (though unintentional) side effects.

For instance, I can imagine that the exercise of synthesising a new event stream gets more complicated as not only do you have to copy events--you also have to manage the current version state for all aggregates. While this particular example is likely solvable, it makes me nervous that there could be other possible side effects that I can't think of.

If we can find a single table solution, I think that would be much preferable in avoiding issues to do with mutable state. I'm not at all fixed on my solution of a unique index--I'd be happy to see a similar Postgres function that used the single table. I think that should be possible? A multi-table solution without mutable state is probably okay too. It's the "mutable" part that scares me. And it scares me a lot.

stevehodgkiss commented 8 years ago

The aggregates table is not a source of truth for the data it holds. It's used to protect against concurrent writes during a transaction (because postgres will lock rows that are written within a transaction until the transaction completes). It can be thrown away and re-built relatively easily. I can't think of a way to do that without having a mutable row that both concurrent connections must lock in order to write to the event stream...

stevehodgkiss commented 8 years ago

@taoza I've addressed your comments...

grassdog commented 8 years ago

LGTM :+1: I'd be interested in running a performance test against this to see how many events per second we could throw at an aggregate.

grassdog commented 8 years ago

I've been thinking a little more about this and how we'll migrate to it into Identity.

To ease the migration path into Identity I think it's probably worth keeping the old EventStore implementation around alongside this new one.

To migrate Identity we'll have the old event store still running and fire up an instance of the new versioning one. One that new one catches up then we feature flip/downtime deploy our way across to using the new implementation solely.

We can remove the old deprecated implementation once Identity is migrated.

Thoughts?

stevehodgkiss commented 8 years ago

Added a benchmark script. The new version with optimistic concurrency is actually faster than the current one without optimistic concurrency...

❯ ruby -Ilib script/bench_writing_events.rb
Warming up --------------------------------------
without_optimistic_concurrency
                       158.000  i/100ms
with_optimistic_concurrency
                       183.000  i/100ms
Calculating -------------------------------------
without_optimistic_concurrency
                          1.531k (± 9.9%) i/s -      7.584k in   5.002478s
with_optimistic_concurrency
                          1.959k (±10.6%) i/s -      9.699k in   5.005551s

stevehodgkiss commented 8 years ago

This is ready for a final 👀 over before I merge. Things I'll work on in subsequent PRs:

Add an event UUID field to events (for idempotent writing). Right now events.id is a sequence ID. Would we prefer that renaming to sequence_id and add a uuid field or just add uuid and leave id as the sequence?
Support for writing multiple events atomically to the store.
Add table locking to ensure linear growth. We could probably make this a config option with a default to true.

cc @grassdog @vonconrad

vonconrad commented 8 years ago

Would we prefer that renaming to sequence_id and add a uuid field or just add uuid and leave id as the sequence?

My vote is for sequence for the sequence and id for the UUID.

Taking a look through the PR now.

grassdog commented 8 years ago

I'm with @vonconrad (we need to get shirts with that made up). I prefer sequence for the sequence and id for the UUID.

stevehodgkiss commented 8 years ago

sequence or sequence_id? It's still a unique identifier, I'm leaning more towards sequence_id...

grassdog commented 8 years ago

sequence_id works.

vonconrad commented 8 years ago

Either's fine by me.

vonconrad commented 8 years ago

Seems reasonable! 👍

envato / event_sourcery

Optimistic concurrency support #39