envato / event_sourcery

A library for building event sourced applications in Ruby
MIT License
84 stars 10 forks source link

Support: How to handle and avoid duplicates #218

Closed berkes closed 4 years ago

berkes commented 4 years ago

I hope it is OK to answer questions about usage and general patterns here.

I have events that process GeoJSON places. It needs to avoid creating "duplicates" based on attributes. The exact details are domain-specific, so a Simplified example is given below.

class Place
   include EventSourcery::AggregateRoot

    def add(payload)
        body = payload.merge(place_id: place_id_builder.id)
        apply_event(PlaceAdded, aggregate_id: id, body: body)
    end
end

Here, the place_id_builder.id generates an String (not a UUID!) for a place based on heuristics such as the name, the kind of place, and the location (lat/lon). For the sake of the example, one can imagine a UserRegistered that must check for already existing email attributes, for example.

I don't see any API to find all PlaceAdded events where body.place_id = place_id. I'm not certain that searching previous events for duplicate parameters is the right pattern at all.

Currently, I've implemented this in the projector. Where it simply avoids inserting a record into the query database on duplicates.

The "places" projector now does:

project PlaceAdded do |event|
    unless table.where(place_id: event.body['place_id']).any?
      table.insert(
        id: event.aggregate_id,
        place_id: event.body['place_id'],
        # .. more attrs
    )
 end

Note that this can probably be implemented cleaner with a catch on Postgres unique constraints errors.

Also, this seems clumsy if in case of duplication you want to emit another event, e.g. a DuplicatePlaceIgnored.

This works. But the asynchronous nature prohibits me from sending the client back a HTTP or some other error. In my specific example, I't fine with that, but the example where an email must be unique, one would probably want to convey this to the user with a proper error/validation message.

How is this typically achieved? And how can event_sourcery help here? Should I keep a separate projection where the Aggregate can check for duplicates? Should aggregates know about projections at all? Should I search through past events instead? And if so, how do I achieve this with the setup of event-sourcery where event-bodies are un-indexed JSON "blurps"?


Stackoverflow has an interesting answer on a similar case as well. Where the command is the one doing the checks.

For example, the command that creates the short URL will validate that the read store does not contain such a short URL already and we will only commit our event, if we can commit the changes to our read store first.

https://stackoverflow.com/a/43613564/73673

twe4ked commented 4 years ago

Should I keep a separate projection where the Aggregate can check for duplicates?

The way you currently have this modelled this sounds like the way to go, this is often referred to as a “command side projection”. You will however have a race condition. If the command side projector isn't fully up to date when a command comes in, you can get duplicates.

This can be handled in a few ways, one is the what you've described with your projector. The projector can handle removing duplicates. Another option is to have a reactor keep an eye out for duplicates and emit compensating events, PlaceDuplicateDetected for instance. Then other projectors and handle that event as they see fit, removing the duplicate place for instance.

Taking a step back for a second, how does Place work? You haven't mentioned aggregate_id at all. What does once Place represent? Is a single Place meant to contain multiple PlaceAdded events? If a Place is only meant to have a single PlaceAdded event then you might be able to use the aggregate_id to handle this.

If place_id_builder.id could be generated by the client before the command is sent you could have the aggregate ensure that it's only got a single PlaceAdded event. This removes the need for the command side projection entirely and therefore removes the race condition. You could potentially use a UUID v5 to turn place_id_builder.id into a valid UUID.

berkes commented 4 years ago

Thanks for your reply!

The UUIDv5 was unknown to me and it looks like the perfect solution here. I could not use UUIDv4 (SecureRandom.uuid) since that is random and unrelated to the set of attributes that make a Place unique^1. But encoding the string in a UUIDv5 (Digest::UUID.uuid_v5) seems to work.

But, to summarize your feedback for future readers, I understand the "validate uniqueness" has several solutions, based on your (domain) needs.

  1. Handle in the command. The command looks in the projection to determine if this is a "duplicate". This will occasionally give false-positives, as it uses the projection wich is updated later (race-condition)
  2. Handle it in the processor: the projector or reactor. 2.1. A reactor determines wether it is a duplicate and then issue either an PlaceCreated or PlaceDuplicateDetected event. 2.2. The projector determines at projection wether it is a duplicate and then issue a PlaceDuplicateDetected event, or it proceeds inserting the data into a projection. (This is what I have ATM).
  3. Leave it entirely to the client. The client determines the uniqueness and builds an aggregate_id from that data. This is the uuid_v5 method as described above.

Each has it's use-cases and its pro's and cons. It really depends on the case-at-hand wich one is more apt for your specific scenario, though. E.g. registering a new user and requiring to give feedback on whether or not the user-provided-email exists, would probably be handled best with case 3. But a subscription to a newsletter, where a duplicate email-address is of far less importance, would probably be best suited with case 2: in which the reactors or processors handle the duplicates async.

^1: A place's uniqueness is a combination of it's normalised name, the area(lat/lon+radius) and its category. E.g. MXQ4+M5;historic:monument;statueofliberty != MXQ4+M5;shop:museum;statueofliberty: a complex "domain" problem, really. :)