Allow to compare Event instances

andrewgr commented 7 years ago

This code is extracted from Hosted service repo. Originally I wrote it to be able to compare the emitted events with the expected ones in tests with the "standard" Rspec matchers:

it 'emits LicenseGranted' do
  expect(emitted_event).to eq(LicenseGranted.new(body: {
    'subscription_id' => 1,
    'theme_id'        => 2,
    'sso_id'          => 'ee287b23-2254-490a-a9f4-72fb35870ce1',
    'granted_at'      => '2016-12-12 12:12:12 UTC'
  }))
end

stevehodgkiss commented 7 years ago

Equality should include all the attributes (id, aggregate_id, created_at are missing). Virtus provides this method and compares all these keys anyways though.

vonconrad commented 7 years ago

I'm not sold on this being the right way to go about the described problem. If this is solely a testing concern, I would rather see it solved there. In identity, there's a custom RSpec matcher which I'd like to see make its way into EventSourcery (though modified to include aggregate_id). It solves the same underlying problem (testing emitted events), but does not modify the Event class at all.

andrewgr commented 7 years ago

Thanks for your comments.

_Re comparing events w/ or w/o aggregate_id, id, created_at:_ I view events as a piece of information about a fact that has happened in the physical wold. That is why the name/type of the event and its attributes/body are enough for me to uniquely identify an event. aggregate_id, id, created_atm etc. are implementation details, something that only the system cares about. The physical world outside the system doesn't know about those attributes, so I exclude them from comparison. To me two events like:

LicenseGranted.new(body: {
  'subscription_id' => 1,
  'theme_id'        => 2,
  'sso_id'          => 'ee287b23-2254-490a-a9f4-72fb35870ce1',
  'granted_at'      => '2016-12-12 12:12:12 UTC'
})

will be different only if their bodies are different (the type is defined by the class name). I'd view three instances of this event class with the same payload simply as duplicates. In other words, it doesn't matter what would be created_at or aggregate_id or id for any events of LicenseGranted class with that payload, because they all will tell me that "at 2016-12-12 12:12:12 UTC a license was granted to a subscription with id 1 for theme with id 2 for a user with SSO id ee287b23-2254-490a-a9f4-72fb35870ce1".

Re custom rspec matcher: Hosted Service used to have a copy of that matcher too. I wanted to use as much of standard rspec functionality as possible.

warrenseen commented 7 years ago

In other words, it doesn't matter what would be _createdat or _aggregateid or id for any events of LicenseGranted class with that payload

I think you're fundamentally misunderstanding and/or misusing aggregate_id if you think that's an "implementation detail."

Given how this would have impacts on a number of other systems, I don't think this change is appropriate to make within Event Sourcery. 👎

stevehodgkiss commented 7 years ago

I agree with @warrenseen and @vonconrad. 👎 to these changes in event sourcery.

andrewgr commented 7 years ago

@warrenseen aggregate_id in event_sourcery is a detail of our implementation. For example, the implementation that Simon Harris presented used correlation ids for events which were not equivalents of aggregate ids from event_sourcery. I asked him after his presentation wether they are equivalents or not (and I think his approach to correlating events is better than ours). If something is part of one implementation, but is not a part of another implementation then it is an implementation detail. In addition to this, the aggregate_ids in Hosted service in many cases are derived from the event payloads, that makes them redundant. There are use cases when it is convenient. So in theory, Hosted service may work without aggregate_id attribute. It would make interpreting events more flexible but would require more code to build aggregates, so it might not be very practical, at least not in event_sourcery-based app. I can't see what I fundamentally misunderstand exactly. Happy to discuss that further at the next brownbag session.

I'm happy to add aggregate_id to the comparison to ensure that the event is generated for the expected aggregate since aggregate_id is essential in our implementation. However, i still think that id and created_at should not be part of the comparison. What do you think?

If you don't think that should be a part of event_sourcery - I'm fine with closing this PR too.

vonconrad commented 7 years ago

@andrewgr I'm wondering whether we have different interpretations of just what an Aggregate and its aggregate_id actually is. It's a bit unfortunate that you missed the last brownbag, because we spent 30 mins of it talking about this exact topic. I have an action from that meeting to prepare more material about it, though, so there will be more opportunities to discuss.

It may be true that the Hosted service can uniquely identify how an event fits into the bigger picture without the aggregate_id, but that is far from the case for many (if not most) event sourced applications. It's not merely "convenient," but a vital component of the way that we do event sourcing. Consider the following (example) events:

# Order aggregate
OrderPlaced.new(body: {
  product_id: '4eec6a6e-2f58-49b4-b8a2-14febbd86c47',
  customer_id: '31e934ae-6818-428c-a34a-5a043896cce1',
  placed_at: '2017-01-30 02:54:41 UTC',
})

OrderInvoiceGenerated.new(body: {
  s3_url: '//s3.amazonaws.com/documents/390916a8-93da-4f33-9475-75522503860f.pdf',
  generated_at: '2017-01-30 03:00:15 UTC',
})

OrderShipped.new(body: {
  packed_by_worker_id: '592df301-1d0d-4080-869e-e3c85d179163',
  from_warehouse: 'Melbourne',
  shipped_at: '2017-01-30 03:00:56 UTC',
})

# Person/Customer aggregate
NameChange.new(body: {
  first_name: 'Fred',
  last_name: 'Basset',
  changed_at: '2017-01-30 02:55:44 UTC',
})

EmailChange.new(body: {
  email: 'joe@bloggs.com',
  changed_at: '2017-01-30 02:55:55 UTC',
})

# Payment aggregate
PaymentAttempted.new(body: {
  gateway: 'PayPal'
  amount: 100.0,
  attempted_at: '2017-01-30 03:02:18 UTC'
})

PaymentFailed.new(body: {
  gateway_message: 'Recipient does not exist',
  failed_at: '2017-01-30 03:03:00 UTC'
})

In any of these cases, the information in the body is not sufficient to understand the context of the change. We need to know which order was placed, shipped, and had its invoice generated. We need to know which person had their name or email changed. We need to know which payment was attempted and failed. This is what the aggregate_id does. The aggregate is the entity which the event happens to.

That said, I'm sure that there are many other ways to do event sourcing that doesn't rely on aggregates. But that's not how Event Sourcery works. I'm not against a equality check for events, but I do believe that aggregate_id is a non-negotiable component of any such check. I'm more flexible on id and created_at, but given that == in the Ruby world typically means object-to-object comparison, I think it's reasonable to include those attributes.

I don't want to repeat myself over and over, but I'll state again that I think if you want to compare the emitted event of a command or a reactor for testing purposes, an RSpec matcher is the way to go.

stevehodgkiss commented 7 years ago

Correlation ID's are a different thing to aggregate IDs. They're used to track the events that result from issuing a command or request and the flow on effect of event processing in reactors. It's something we're going to add to event sourcery actually because it would be quite useful to track down or debug issues. The current proposal is to add correlation_id and causation_id to events. For a given request, correlation ID would be the same on all the events emitted due to the command & all the events that result from reactors processing the events emitted by that command. Causation ID is the thing that caused that event to be emitted (which is what we currently store in the body as _driven_by_event_id for reactors). In Identity for example, given a correlation ID of someone submitting a tax form, you'd be able to query the events table and see that a reactor emitted a billing details overridden event.

An aggregate is a core concept in event sourcing. Users of the system being built hold references to aggregates by storing their ID. It's a consistency boundary (see http://geekswithblogs.net/Optikal/archive/2013/04/07/152643.aspx for more on that).

I'm pretty sure Simon's implementation has aggregate IDs (although it may have been called something different like event stream ID). @harukizaemon ?

I don't want to repeat myself over and over, but I'll state again that I think if you want to compare the emitted event of a command or a reactor for testing purposes, an RSpec matcher is the way to go.

+1

I'm not against a equality check for events, but I do believe that aggregate_id is a non-negotiable component of any such check. I'm more flexible on id and created_at, but given that == in the Ruby world typically means object-to-object comparison, I think it's reasonable to include those attributes.

Agreed, but I feel more strongly about ID & created_at being also included. You could have the same event happen again to a user and the ID/created at might be the only difference between those events. Also we already have equality methods provided by Virtus/Equalizer that compares all the attributes we've defined.

andrewgr commented 7 years ago

Thanks for the examples @vonconrad! I totally agree that aggregate_id is vital in the "event_sourcery way" and is not just a convenience. As for aggregate ids as "functions" of event body, sure, for an arbitrary event it would be impossible unless its body had been specially designed for that purpose. Fortunately, the data that Hosted Service works with allows that and it is very convenient.

@stevehodgkiss "Causation ID as a table column" vs. "_driven_by_event_id as a part of event body" is one of the things that I liked better in @harukizaemon 's implementation than in event_sourcery (if I remember correctly he had causation_id column). IMO a table column is a better choice for that. Also, generally speaking Correlation ID may be the same thing as aggregate id (like in event_sourcery) but that's not required and may depend on implementation. I'm happy to show you an example using Hosted Service. We can discuss that at the brown bag session, other developers may also find this useful. I'll be at the next session. Last Monday a new developer started in our team and we had a team lunch at 12 :)

As for this PR, I'll keep it open till tomorrow just in case someone else would like to comment. Since you feel strongly that id and created_at should be included into comparison and virtus does that already then this code is not needed.

stevehodgkiss commented 7 years ago

Fortunately, the data that Hosted Service works with allows that and it is very convenient.

Why duplicate the aggregate ID in the event body? Why is it convenient to look in the body for that vs just accessing the event.aggregate_id? The reason aggregate_id is it's own thing, in terms of event sourcing in general I think (regardless of implementation) is so that you can load an aggregate up into memory, replay events and build up "current state", then issue a command and reject/accept it (and emit an event if accepted). It's a consistency boundary in terms of validating business rules and transactionality. I don't understand why you'd want to (or need to) duplicate that data in the event body...

"Causation ID as a table column" vs. "_driven_by_event_id as a part of event body" is one of the things that I liked better in @harukizaemon 's implementation than in event_sourcery (if I remember correctly he had causation_id column). IMO a table column is a better choice for that.

I agree 100%, we have plans to move to an approach like that with EventSourcery (storing correlation_id and causation_id as columns on the events table). Causation ID is the thing that this event was in reaction to, and correlation ID is the overarching command/request that started this chain of events. They're both valuable for tracing things that happen in the system.

Also, generally speaking Correlation ID may be the same thing as aggregate id (like in event_sourcery) but that's not required and may depend on implementation.

Correlation ID is not an aggregate ID. Correlation ID represents an interaction with the system (an http request for example). An aggregate ID represents a thing within the system, for example a user/person or an item for sale etc. A correlation ID would be the unique ID assigned to the http request that submits that item for review (for example).

I'm happy to show you an example using Hosted Service

Sounds good! Interested to learn more about how the Hosted service works in respect to what we're discussing :)

envato / event_sourcery

Allow to compare Event instances #91