Eventuous / eventuous

Event Sourcing library for .NET
https://eventuous.dev
Apache License 2.0
442 stars 70 forks source link

With the SQL Server event store, event handlers are sometimes called twice for each event #305

Closed Steve-OH closed 5 months ago

Steve-OH commented 8 months ago

Describe the bug This is a very wonky thing: Using the SQL Server event store, as updated in the latest 0.15.0-beta.8.2 release, if an event handler executes a SQL query as part of its processing of an event, it will be re-invoked once more with exactly the same event. If the event handler does not execute a query, then this doesn't happen.

To Reproduce See the repository at https://github.com/Steve-OH/eventuous-test-issue-305. Follow the instructions in the README therein.

Expected behavior An event handler in a subscription should execute exactly once for each event, barring any catastrophic issues (power failure, etc.). Granted, event handlers should be idempotent as far as possible, but even then an event handler running twice for every event can have a significant performance impact.

Screenshots There are some in the test repository's README.

Desktop (please complete the following information):

Additional context The only thing I can think of is that something is going wrong with checkpoint tracking.

alexeyzimarev commented 8 months ago

I didn't look deep in the issue, but what you describe in "expected behaviour" is not what is actually expected.

An event handler in a subscription should execute exactly once for each event

Eventuous never aimed to provide any guarantee for "exactly once" event processing. Doing so would require that the checkpoint is stored in the same database where the subscription writes to (if it does) and it must happen in the same transaction. Checkpointing and event processing aren't coupled in Eventuous. so it's physically impossible.

This is from the docs:

When using Eventuous, it's important to remember that it does not enforce exactly one event processing rule (although it can), as it would have a negative impact on the subscription's performance. Therefore, when the handler has processed an event, it might eventually need to process it again when the application restarts after a crash. It might sound a bit scary, but in reality, those events will be delivered again in the same order, and it's easy to mitigate the issue by ensuring that the projection handler is idempotent. For our example, we could do that by using updateOne operation with the option isUpsert set to true instead of using the insertOne operation. Any update operation is by definition idempotent as long as it doesn't use operations on the existing state like inc or dec. That's why it's essential to only use event properties in updates, so the event needs to contain enough information for the projecting handler to execute the update without using the current projected document state.

I agree, though, that event handler being called twice for each event is weird. But, the tests I have don't exhibit such behaviour. I need to look closer at your repro to understand what is going on.

Steve-OH commented 7 months ago

I understand about CAP and all of that, but that shouldn't really be applicable in this simple example. This is a case of just one subscription and a single local SQL Server instance for both messages and checkpoints. There isn't any contention of the sort that would be expected to lead to message delivery being consistently retried.

Since I posted the original code, I've discovered that the second invocation of the event handler is occurring while the first invocation is still in progress, so it's some kind of weird race. That also explains why some of my first attempts at resolving the problem were not successful.

On a hunch, I tried rewriting the event handler as a sync rather than async function, and that fixes the problem, which further confirms that there is some kind of async-related race. I've updated the code to simplify some things (no more DI required), and there are two branches, sync-read and async-read, that show the correct and incorrect behavior, respectively.

alexeyzimarev commented 7 months ago

Yeah, I don't disagree, don't get me wrong. I was thinking about it last night. Maybe it's related to the polling query back pressure that I added to 0.15. I got to think that the current test suite runs like "produce 100 events, then consume 100 events", where if you produce one and consume one, then do it again, there might be something there (checkpoint basically).

I will add a test like that to see if it can be reproduced internally. Thanks for reporting :)

Steve-OH commented 5 months ago

Should this be closed? 4a65311 seems to have fixed the problem.