Sample: potential extensions

paulvanbladel commented 5 years ago

Tomas, I guess that currently this sample is the 'main' sample for Dolittle (at least, the most up to date). I'm walking through the documentation (I'm in parallel learning DDD, so don't expect any sophistication).

Is it correct that the current todo sample only deals with stateless aggregate roots? I guess it is. So, allow me to formulate a meaningful use case that could cover stateful aggregate roots, so where rehydration comes into the picture.

Let' say that a business rule would be that: a todo which is currently in state done can only reopened again if the elaspsed time between now and the moment where the todo was in state done is not more than 24 hours. With my current state of insight in dolittle, I would say that this business rule cannot be managed by the dolittle rule engine because the rule engine uses the read model and the read model doesn't have the required information on this matter. So, what would be your suggested ways of solving this? My intuition would be to go for stateful aggregate root in such a way we have the information available in the domain. Or would you extend the read model with the necessary information in such a way the rule engine can handle this specific case?

TomasEkeli commented 5 years ago

Again, @paulvanbladel, thanks for your great questions! First off - yes this is currently the most updated and complete sample of Dolittle, but it does not use all the capabilities.

For this case I would tend towards extending the read-model, or even creating a separate read-model for this. Aggregate Roots can be stateful, as long as that state is not externally available, i.e. the internal entities cannot be shared outside the Aggregate Root. I prefer to have business rules that can lead to a command rejection captured in business rules for the command. There is a case for doing this kind of validation directly in the Aggregate Root as there is a small window of time between the business rules being validated and the Aggregate Root receiving the command. Another case for in-Aggregate-Root-validation is when the Aggregate Root does several things to several systems (i.e. lookups and checks). Always remember: The Aggregate Root is the last thing that can reject a command, but you should try to reject a command as early as possible.

For this particular case we could extend the TodoTask with a nullable DateTime property CompletedAt and make sure that is set in the ItemDone and ItemNotDone processors. Then we could add a rule that checked whether the item could be marked as "not done" based on this 24 hour restriction.

Perhaps better would be to make another read-model dealing with just this rule could be made that just stored when tasks are marked as done. This read-model would exist just for the validation. In general I don't fear introducing read-models for specific use-cases, as it keeps the original read-model independent of this new need. I would also then make new processors for the ItemDone and ItemNotDone events that update this read-model, and make a rule that considers it. This way new aspects can be added without changing existing read-models.

In the end it is a question of style and personal preference. Simple things like this example I usually weave into existing read-models and validation, more complex or complicated ones I try to separate into their own thing.

paulvanbladel commented 5 years ago

So, in dolittle the Read model definition goes broader than simply serving the read pipeline. It's used also in the validation pipeline of the command side. Anyhow, completely legitimate. I think you have valid point

There is a case for doing this kind of validation directly in the Aggregate Root as there is a small window of time between the business rules being validated and the Aggregate Root receiving the command.

Allow me to rephrase just to check if I understood. So, in systems under very high pressure, where a lot of users work on a small set of data, it can make sense to do a critical business validation directly on the aggregate root (rather than input validation via the read model) because this is the only way to make sure to have the latest data directly from the single source of truth (i.e. the event store), while the readmodel is 'only' eventually consistent.

BTW, how is the synchronization between readmodels an the command execution pipeline. When a command returns to the caller of course no real data are returned but a simple status. Is at that point in time the updates of the readmodels already digested or is there genuine eventual consistancy? Is there support for what Jeremy Miller would call inline projections (http://jasperfx.github.io/marten/documentation/events/projections/)

TomasEkeli commented 5 years ago

It all boils down to the events, they are when things happen in the system. As there is some time between when the event is committed and when the read-models are updated there is a window for validation to go through when it should not. Moving the validation into the aggregate root makes that window smaller, but it does not take it away entirely. The aggregate root could be calling other services and checking internal state and still commit to doing things that "should" have failed. This is the nature of any distributed system.

The way to consolidate it is to regard the events and only the events as the truth of the system. When (not if, when) events are submitted that really should not have because of concurrency that has to be fixed somehow. This should be done by a compensating event, that cleans up the system, i.e. by making the fix explicit.

I've an example on this repo where I implement the rule you suggested with a new read-model custom to this validation, as it is a very valid consideration. In this small system you won't get any concurrency issues, but if there were other systems involved we could see cases of the validation accepting a command that it should not. It is, however, exceedingly rare.

paulvanbladel commented 5 years ago

@TomasEkeli . Thanks. So, I keep as rule of thumb: avoid putting state on aggregate roots and use as much as possible the provided business input validation mechanisms which might need additional read models.

I see one potential issue here. Sometimes, business rules are meant server-only because they contain confidential data and expose confidential results (e.g. credit limit checks, etc.) As far as I can see, read models are exposed to the client automatically, which would mean that these confidential read-models are auto exposed ?

TomasEkeli commented 5 years ago

There's no rule against putting state on aggregate roots, that's totally fine. The thing to avoid is to expose this state outside of the aggregate root :) So, for internal things the aggregate root needs - state is fine. The preferred way of doing this is through applying the events and having methods in the aggregate root that set state based on the events. This is because the aggregate root is not serialised directly, rather it is created anew and all the events it has applied are re-run against it.

Concerning business-rules: these are only supposed to run on the "server", not on the "client". Any input validation can be run on the client, but has to be re-run on the server (never trust the client). We are looking into having a convention where you could write a client-version of your business-validation and we would discover this, but it is not done yet.

smithmx commented 5 years ago

Interesting discussion guys.

The aggregate root is the absolute arbiter of whether something is allowed in terms of the local invariants. This is the definition of an aggregate - it maintains whatever invariants need maintained in the context of a transaction. In this scenario, you would reapply the event that transitioned the state of the ToDo item and record the time in a local field. When attempting to transition the state you would check the time rule against that local state and either allow the transition or throw an exception. This is a great example of an invariant that is managed by the aggregate root. In this sense, the "business validator" is just a helper, trying to catch the problem before hitting the aggregate. As an aside, the reason we throw an exception from inside the aggregate if it doesn't work is that we have already performed all the validation, security, business rules before we hit the handler and expected it to work. The reason it wouldn't work is concurrency or eventual consistency. That's considered an exceptional case.

Where the rule is set based, it cannot be managed by a single aggregate. Think of a simple rule where you need a unique username for signup. The signup aggregate wouldn't know whether the username is unique as it would only know about it's own. Therefore you'd have to rely on the read side. At this point in time, this is a single read side for both display and for domain support. I think we definitely need to support the inline projection thing and have a much clearer separation of domain support and read side. There is an interesting discussion to be had about the best way to support this kind of set based validation. It basically comes down to how to handle a failed scenario (i.e. trying to set up a second account with the same username) and the significance of this. That is, it is effectively a domain problem, not a technical one.

In regards to automatically exposing info that is only for supporting business rules. You just wouldn't implement the rule as a query for a read model. If you're underlying data store is MongoDB for example, you could be storing your confidential stuff in a custom collection and accessing it using the capabilities of Mongo. Having said that, there are certainly plans to be much more prescriptive about what is effectively your contract with the outside world (don't make all events subscribable, don't expose all commands or queries via the API).

paulvanbladel commented 5 years ago

@smithmx thanks a lot. Very interesting ! @TomasEkeli Maybe some confusion, I'm totally aware that nothings "runs" on the client. My concern is only about readmodels that exist for the sake of server side business logic validation (so for the write pipeline) are potentially also exposed to the client. Maybe I'm wrong in assuming that ALL read models are by default exposed to the client. So with exposed I mean when you know the endpoint adddress (e.g. via het proxy generation as in the aurelia client), you can run the query. Indeed, I tried to experiment to put some state on aggregate root via the ON method, and indeed, when retrieving the aggregate (inside an command), the state is accesible when the events are replayed. So, I guess you have some magic that detects if there is state (by means of relevant On methods on the aggregate) and only do an actual replay of commited events when necessary. Very clever indeed :)

Nonetheless (but off topic in this thread), what you mention so share logic between client and server can become interesting with Blazor where you can really share a common assembly between client and server as in the good old silverlight days.

smithmx commented 5 years ago

@paulvanbladel That is what we do. As a performance improvement, if we detect that the aggregate is stateless (no On methods) then we just fetch the version and set that. No need to replay events that don't set anything. Obviously, we'll also be implementing snapshots / mementos for stateful aggregates.

TomasEkeli / ToDolittle

Sample: potential extensions #3