Azure / azure-signalr

Azure SignalR Service SDK for .NET
https://aka.ms/signalr-service
MIT License
423 stars 100 forks source link

Azure SignalR doesn't process messages in order under high load #1655

Closed bacobart closed 1 month ago

bacobart commented 2 years ago

Describe the bug

SignalR messages sent to the client are processed in order (when sent from the same source to the same clients). This behaviour is observed when hosting signalr yourself. However when Azure SignalR is used then some messages are received out of order when there is a high volume of messages. I couldn't find anything about this in the docs, but according to @davidfowl this should be the case, https://twitter.com/bacobart/status/1557296764100743168.

Expected Behavior

When adding AzureSignalR messages received on the client are in order.

Steps To Reproduce

Reproduction at https://github.com/bacobart/azure-signalr-out-of-order-repro

Without adding .AddAzureSignalR() in the server messages are processed in order as expected. When .AddAzureSignalR() is added some messages are processed out of order.

Exceptions (if any)

none

Further technical details

.net 6.0.400 Microsoft.AspNetCore.SignalR.* are 6.0.8 Microsoft.Azure.SignalR is 1.18.1

davidfowl commented 2 years ago

@vicancy Do we send messages over different connections from the same source https://github.com/bacobart/azure-signalr-out-of-order-repro/blob/8d2ba0012f9815c6ff3f32e92f1a320dab04adaa/SignalrSpamServer/SpamService.cs#L42? This code shouldn't result in out of order messages

KKhurin commented 2 years ago

@bacobart, your observations are correct. Currently the message order is only preserved when messages are sent from within the hub call (or its execution context) originated either from a client message or a client connected event. When the hub context is created via DI, the Azure SignalR SDK does not currently have the code to ensure that the order is always correct. We are aware of this problem and have a plan to fix it in the future SDK releases. As a possible temporary workaround, you can move the Task.Factory.StartNew(() => SpamTask(cancellationToken), TaskCreationOptions.LongRunning); code inside OnConnectedAsync() method.

The internals of this are rather complex, but in short, the problem stems from the fact that the SDK maintains multiple server connections to the SignalR service. Messages sent to a client over different server connections would not necessarily arrive to the client in the original order. To be able to arrive in the correct order, these messages would need to be sent over the same server connection. When the messages are sent from within the hub call (or from OnConnectedAsync) SDK has a chance to set an async local variable before the hub code gets executed. This asynclocal helps to maintain the same server connection selection for each SendAsync within the hub call context (e.g. any of its execution context derivatives, e.g. Thread.Start, QUWI, Task.Factory.StartNew, etc). When the messages are sent outside of this async local context (e.g. using hubContext created via DI), the SDK is not able to pick the same server connection each time SendAsync is called.

Again, we have a plan to fix this in the future. Please let us know if the proposed workaround is not suitable for you and we'll look for more options before the final solution is available.

westdavidr commented 7 months ago

@KKhurin Is there any update on this?

westdavidr commented 7 months ago

@KKhurin Is there any update on this?

I am attempting to use Azure SignalR with Azure OpenAI streaming completions, but the message chunks are being received all out of order on the client side. Is there a workaround for this?

owen-barbour commented 3 months ago

Is there a timeline on this being fixed? This is rendering Azure SignalR unusable for our application.

jlewicki-nevo commented 3 months ago

A timeline for a fix would be appreciated!

MarcoSchuetz commented 2 months ago

@KKhurin Any update to this issue? It still seems to be causing problems for several people. @vicancy do you have a tip or solution to this?