Open oising opened 2 weeks ago
Triaged
Imo this is a P0 as it fundamentally breaks the underlying FIFO ordering that one would expect from EventHubs when processing each message individually via Dapr PubSub
Imo this is a P0 as it fundamentally breaks the underlying FIFO ordering that one would expect from EventHubs when processing each message individually via Dapr PubSub
Agree
Imo this is a P0 as it fundamentally breaks the underlying FIFO ordering that one would expect from EventHubs when processing each message individually via Dapr PubSub
And also the EH binding! They share the same code AFAICT.
Hey @olitomlinson @yaron2 - After a crash course in golang, I don't think the issue is at https://github.com/dapr/components-contrib/blob/main/common/component/azure/eventhubs/eventhubs.go?rgh-link-date=2024-10-16T14%3A11%3A56Z#L293 as this starts a goroutine for the partition which seems perfectly fine. Partitions should be handled in parallel. However, looking at https://github.com/dapr/components-contrib/blob/4ca04dbb61c553047727ec709f37a2b4b9832159/common/component/azure/eventhubs/eventhubs.go#L398 I can see that messages within the partition are dispatched as goroutines, where I would expect here they should be blocking calls to ensure correct dispatch ordering, no?
Hey @olitomlinson @yaron2 - After a crash course in golang, I don't think the issue is at https://github.com/dapr/components-contrib/blob/main/common/component/azure/eventhubs/eventhubs.go?rgh-link-date=2024-10-16T14%3A11%3A56Z#L293 as this starts a goroutine for the partition which seems perfectly fine. Partitions should be handled in parallel. However, looking at
I can see that messages within the partition are dispatched as goroutines, where I would expect here they should be blocking calls to ensure correct dispatch ordering, no?
That seems correct, yes
So removing the go
prefix should be enough? I should probably rename handleAsync
to something like handleEvents
-- it's interesting to me how the method being called asynchronously and named as such has no bearing on the method's body. Quite simple!
If this really is a one line fix, would you expect unit tests? They would be entirely beyond me at this point in my golang career :D
Also, as a P0 bug - would this warrant making it into 1.14.5 ?
Hey @olitomlinson @yaron2 - After a crash course in golang, I don't think the issue is at https://github.com/dapr/components-contrib/blob/main/common/component/azure/eventhubs/eventhubs.go?rgh-link-date=2024-10-16T14%3A11%3A56Z#L293 as this starts a goroutine for the partition which seems perfectly fine. Partitions should be handled in parallel. However, looking at
I can see that messages within the partition are dispatched as goroutines, where I would expect here they should be blocking calls to ensure correct dispatch ordering, no?
This is exactly what I said in Discord :)
This is exactly what I said in Discord :)
I obviously misread or missed that -- but it's good that we agree! :) I will submit the two-line PR as draft and link it, and we can go from there. Given this is a blocker for our solution, I would really like to see this make a point release and not wait for 1.15...
So removing the
go
prefix should be enough? I should probably renamehandleAsync
to something likehandleEvents
-- it's interesting to me how the method being called asynchronously and named as such has no bearing on the method's body. Quite simple!If this really is a one line fix, would you expect unit tests? They would be entirely beyond me at this point in my golang career :D
Also, as a P0 bug - would this warrant making it into 1.14.5 ?
It could be as simple as a one-liner, but it needs thorough testing to makes sure that the checkpointing is done correctly after each message completes.
My one reservation on fixing this quickly is that there may be users out there in the wild with high-throughput use-cases that depend on the throughput that is currently afforded by this incorrect implementation. Until a fix is in place, its hard to quantify what that performance degradation maybe by checkpointing on each message.
The real solution here is to use Bulk Subscriptions for high throughput use-cases, but this is not Stable yet.
Idea : This could be fixed but the fix is put behind an opt-in feature-flag on the metadata so it doesn't impact people with existing expectations (from the incorrect implementation).
name : enableInOrderMessageDelivery
value : "false"
Then, when bulk subscriptions does graduate to Stable, the feature flag could be removed and replaced with an opt-in feature flag that reverts the behavior back to the broken implementation. And users with high-throughput expectations are encouraged to migrate to Bulk Subscriptions (or opt back in to the previous broken implementation, for a window of supported releases)
name : enableLegacyMessageDelivery
value : "true"
Hmm, I'm not going to be competent enough in the language to fix this in the window that my project requires. If you could collaborate with me, then I may learn enough to address my other feature requests for event hubs myself. How busy are you, lol
Expected Behavior
When using Event Hubs as a pubsub or binding, messages should be delivered in the order they were posted (assume PartitionKey is set when publishing/posting to ensure ordering across partitions.)
Actual Behavior
In the pubsub case, the sidecar delivers new events before the subscriber has completed handling the last one. This causes major problems when trying to ensure order sensitive work is executed correctly (e.g. starting a workflow to process subsequent events.)
Steps to Reproduce the Problem
We're publishing to our topic like this (dotnet sdk):
and receiving like this:
The problem is clear when watching the logs: instead of seeing a constant start/stop/start/stop alternating sequence of log events, we're seeing start/stop/start/start/stop/stop interleaving. The sidecar should not be sending another event until the current one has completed processing, i.e. it receives a http 200 (in this case.)
The same issue likely occurs for the binding since the common code is the problem (according to @yaron2):
Release Note
PubSub and Binding components using ordered delivery (with a partitionkey) would interleave event deliveries to a subscriber. Now the sidecar will wait until the handler returns before sending the next event.
RELEASE NOTE: