eventflow / EventFlow

Async/await first CQRS+ES and DDD framework for .NET
https://geteventflow.net
Other
2.38k stars 445 forks source link

Read Models - enriching event? #554

Closed accoleon closed 6 years ago

accoleon commented 6 years ago

Scenario: We have a Candidate Aggregate, containing the following properties: CandidateId : Identity string FirstName string LastName string SSN A Candidate is set up in the system before an Interview, by a separate user. It emits a CandidateRegistered event, that contains the above 3 properties. There is also a CandidateReadModel that listens to the above event, so I could get a list of all Candidates. It only ever emits 1 event - we never edit or remove Candidates.

We have another aggregate, Interview, that models the act of interviewing a Candidate. Properties: InterviewId: Identity DateTime InterviewDate CandidateId CandidateId string InterviewResult (assume Hire/No Hire). It emits:

I also have an InterviewReadModel, containing a flattened set of properties: InterviewId CandidateId FirstName LastName SSN InterviewDate InterviewResult it listens for InterviewScheduled and InterviewConcluded events to fill out its properties. so that I can have a query to return all Interviews.

How would I go about getting the FirstName, LastName, SSN of the candidate for the properties in the InterviewReadModel?

I thought of the following approaches:

  1. Use the context in the Apply method of the InterviewReadModel, using it to get a MsSqlReadModelStore, and retrieve the specific Candidate by CandidateId gleaned from the InterviewScheduled event. Downside to this is, if I purge all readmodels to repopulate them, I have to repopulate CandidateReadModels first, then InterviewReadModel, otherwise SQL server craps out (I used NOT NULL columns for FirstName and LastName, which fits the domain, but also means the CandidateReadModels must exist first)
  2. Modify InterviewScheduled event to carry more information about the Candidate. Leads to fat events with duplicated data.
  3. Re-think the domain model. Perhaps I need a SelectCandidate Command on the InterviewAggregate, that emits a CandidateSelectedForInterviewEvent that carries the extra information? Once again, duplicated data.
  4. Multiple queries. In order to display a list of all Interviews to the UI, first get all InterviewReadModels, then get CandidateReadModel by Id,
  5. Single query, but do a JOIN between the CandidateReadModel and InterviewReadModel tables. Since I'm using the normal Identity as the identifier, the join performance may be terrible later on. It also defeats the denormalized part of the read model.
  6. Make InterviewReadModel listen for CandidateRegistered event - one can schedule multiple Interviews for a Candidate - if the InterviewReadModel listens for CandidateRegistered, I'll end up creating rows in the InterviewReadModel table that only has FirstName/LastName/SSN but no other columns, which means the insert fails.

What would be the preferred method for solving this problem with EventFlow? I'm wondering if I'm missing a chunk of insight somewhere to solve this relatively common problem, but I haven't really found any resource actually provides a solution.

rasmus commented 6 years ago

Hi

There's isn't really an "EventFlow way" as I do try give developers different possibilities on how they model their domain code. By my experience, there's a lot of "religion" regarding what to do and what not to do.

A different idea would be to instead of parsing IDs via the InterviewScheduled event, pass an Candidate entity that contains basic information like ID, first name and last name. It would mean duplicate data, but would also make your events more self contained. Equality between entities is defines as ID == ID, i.e., if an their IDs match, they are the same. They might contain different data, but that would simply be cause they are from different points in time. At our department we use entities quite a lot as it makes decoupling easier. We add a version field (simply the originating aggregate version) on each to let us keep track which is the newest and help us resolve conflicts.

Parsing entities around does inflate the amount of data, but also makes it easier to implement a eventually consistency model as you don't need to fetch data from multiple sources when doing updates of state and read models. Simply pull the data you need from the events. However, if doing this, you need to address that your read models and state will be out-of-sync while everything settles.

For me it helps thinking how I would solve it "the old fashioned way", i.e., before computers. I would put as much information of each candidate on each piece of paper as possible to reduce the need for me to go looking for the right piece of paper. The analogy doesn't always work, but helps.

Your domain model should reflect your domain. I have seen some scenarios in which missing data meant that we had forgotten to model part of the domain.

As a completely different alternative, you could create a subscriber and do a custom read model update.

A lot of ramblings, I hope you can use some of it

accoleon commented 6 years ago

Yes, it does seem that passing entities around can solve the problem, but it seems to violate DDD principles since it crosses the boundary between the Candidate and Interview aggregates. Sure, we are not passing memory references to objects around, just properties, so it may not matter all that much, but it does mean the events get bigger.

A blog post I've found advocating not using entities in events: https://buildplease.com/pages/vos-in-events/

I agree thinking in "the old fashioned way" does help some matters, but it also caused problems that we were trying to solve - people would literally misspell Candidate names as they got passed to scheduling.

This also ties into the UI and Command layer - assuming there is a ScheduleInterview command that originally takes in CandidateId & InterviewDate (that calls the equivalent method on the InterviewAggreate), and that the command is published from an ASP.NET controller, it could be possible that I could actually do a query for the CandidateId to retrieve the Candidate details, then build the command.

rasmus commented 6 years ago

The article describes using value objects in events as a problem as they might change behaviour. In our department we have "solved" this by creating serialization/deserialization tests for our events to make sure that we discover any breaking changes in value objects.

But as some of our value objects and entities does cross boundaries and micro services, we treat them as any other micro service contract, change by addition or create a new contract.

For us, the reasoning is that we have some entities that are part of the common language we use in our department and they also make sense to use cross boundary. Examples are

In our department that also two conflicting convictions, should entities/value objects contracts be shared via a NuGet or should each service implement their own. Remember, the coupling is still there, even if there's no shared C# class, its only easier to spot if the C# is there (I prefer sharing contract NuGet packages as I'm lazy and we have a lot of very, very useful utility functions implemented in them).

As I said, I try to make sure EventFlow enables developers to implement their CRRS+ES+DDD flavor, but I do personally think it makes sense to share entities as long as there's only one source of truth, i.e. one system that create each entity. If you share IDs between systems, any system consuming those IDs still have a understanding of what they are and how they makes sense in their context.

On larger services we have an anti-corruption layer that basically maps any external entities to representations that makes sense within each context. A "order line" does look very different in a billing system, that a system that sends receipts.

accoleon commented 6 years ago

I agree with you - my use case is small enough that both aggregates appear to belong in the same bounded context (in fact, I had originally designed them as one aggregate), and passing more information through events seem the most stable way to go, compared to the order-sensitive way I have currently implemented (using ReadModelStore resolved from the context in the ReadModel's Apply method).

My entities are small enough that I can get away with passing raw properties directly, so I might try that instead.

Thanks for your input!