Closed stevehodgkiss closed 8 years ago
Looks good, but curious about why we need a separate table and a rather long SQL statement to insert an event. Could we not simply have a unique index across (aggregate_id, version)
, which would prevent duplicates? If an event reactor doesn't specify a version, you could select the current version before attempting to insert. Something like this (pseudo code, not tested):
query = -> { |version|
events_table.
returning(:id).
insert(aggregate_id: event.aggregate_id,
type: event.type.to_s,
body: ::Sequel.pg_json(event.body)
version: version)
}
if expected_version
query.call(expected_version + 1)
else
begin
version = events_table.
where(aggregate_id: event.aggregate_id).
order('id DESC').
get(:version) # automatically adds `.limit(1)` and returns only the specified column
query.call(version + 1)
rescue UniqueConstraintViolation
retry
end
end
Am I missing something obvious? :grin:
That approach doesn't actually ensure that the expected version is what it is, just that expected + 1 is unique. I prefer the solution in this PR because it guarantees that the expected_version is what we expect before inserting (as the API implies).
While the postgres function may be long it's really just a few if branches and 2-3 sql calls, all of which happen inside postgres in one call, in order to minimise transaction time.
@stevehodgkiss I understand that reasoning, but I'm not sure I agree those semantics are worth the extra table. In particular, I'm concerned that moving from having a single table as the source of truth for the system to having two tables (one of which is mutable) has unforeseen (though unintentional) side effects.
For instance, I can imagine that the exercise of synthesising a new event stream gets more complicated as not only do you have to copy events--you also have to manage the current version state for all aggregates. While this particular example is likely solvable, it makes me nervous that there could be other possible side effects that I can't think of.
If we can find a single table solution, I think that would be much preferable in avoiding issues to do with mutable state. I'm not at all fixed on my solution of a unique index--I'd be happy to see a similar Postgres function that used the single table. I think that should be possible? A multi-table solution without mutable state is probably okay too. It's the "mutable" part that scares me. And it scares me a lot.
The aggregates table is not a source of truth for the data it holds. It's used to protect against concurrent writes during a transaction (because postgres will lock rows that are written within a transaction until the transaction completes). It can be thrown away and re-built relatively easily. I can't think of a way to do that without having a mutable row that both concurrent connections must lock in order to write to the event stream...
@taoza I've addressed your comments...
LGTM :+1: I'd be interested in running a performance test against this to see how many events per second we could throw at an aggregate.
I've been thinking a little more about this and how we'll migrate to it into Identity.
To ease the migration path into Identity I think it's probably worth keeping the old EventStore implementation around alongside this new one.
To migrate Identity we'll have the old event store still running and fire up an instance of the new versioning one. One that new one catches up then we feature flip/downtime deploy our way across to using the new implementation solely.
We can remove the old deprecated implementation once Identity is migrated.
Thoughts?
Added a benchmark script. The new version with optimistic concurrency is actually faster than the current one without optimistic concurrency...
❯ ruby -Ilib script/bench_writing_events.rb
Warming up --------------------------------------
without_optimistic_concurrency
158.000 i/100ms
with_optimistic_concurrency
183.000 i/100ms
Calculating -------------------------------------
without_optimistic_concurrency
1.531k (± 9.9%) i/s - 7.584k in 5.002478s
with_optimistic_concurrency
1.959k (±10.6%) i/s - 9.699k in 5.005551s
This is ready for a final 👀 over before I merge. Things I'll work on in subsequent PRs:
sequence_id
and add a uuid
field or just add uuid
and leave id
as the sequence?cc @grassdog @vonconrad
Would we prefer that renaming to sequence_id and add a uuid field or just add uuid and leave id as the sequence?
My vote is for sequence
for the sequence and id
for the UUID.
Taking a look through the PR now.
I'm with @vonconrad (we need to get shirts with that made up). I prefer sequence
for the sequence and id
for the UUID.
sequence
or sequence_id
? It's still a unique identifier, I'm leaning more towards sequence_id
...
sequence_id
works.
Either's fine by me.
Seems reasonable! 👍
Implements optimistic concurrency by supporting an
expected_version
when saving events. If you don't care about versions it can be omitted (EventReactors etc). If you do care about concurrent changes to an aggregate it, specify the version you were at when you emit the event and it will fail if there have been anymore recent changes to that aggregate. Integrates with AggregateRoot class to do this.Previous PR: https://github.com/envato/event_sourcery/pull/10