dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.4k stars 10k forks source link

Signalr Client can't communicate with server after reconnect #51290

Closed dannymelsen closed 11 months ago

dannymelsen commented 1 year ago

Is there an existing issue for this?

Describe the bug

We have a issue where in rare cases after a client has lost connection to the server and reconnects it suddenly can't acknowledges messages from server and can't send messages to the server. Even if you restart the Client it still can't send or acknowledge messages. But if you restart the server all works correctly again.

Expected Behavior

After it reconnects it should be possible to communicate between client & server.

Steps To Reproduce

It hasn't been possible to reproduce this issue yet. It only happens in rare cases.

Exceptions (if any)

2023-10-07T09:53:23.3200136Z [Warn]: Connection started reconnecting due to an error System.Net.WebSockets.WebSocketException (0x80004005): An internal WebSocket error occurred. Please see the innerException, if present, for more details. ---> System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult) at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult) --- End of inner exception stack trace --- at System.Net.Security._SslStream.EndRead(IAsyncResult asyncResult) at System.Net.TlsStream.EndRead(IAsyncResult asyncResult) at System.Net.PooledStream.EndRead(IAsyncResult asyncResult) at System.IO.Stream.<>c.b43_1(Stream stream, IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncTrimPromise1.Complete(TInstance thisRef, Func3 endMethod, IAsyncResult asyncResult, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.WebSockets.WebSocketConnectionStream.<ReadAsync>d__21.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task) at System.Net.WebSockets.WebSocketBase.WebSocketOperation.<Process>d__19.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.WebSockets.WebSocketBase.<ReceiveAsyncCore>d__45.MoveNext() at System.Net.WebSockets.WebSocketBase.ThrowIfConvertibleException(String methodName, Exception exception, CancellationToken cancellationToken, Boolean aborted) at System.Net.WebSockets.WebSocketBase.<ReceiveAsyncCore>d__45.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport.<StartReceiving>d__22.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result) at System.IO.Pipelines.Pipe.GetReadAsyncResult() at System.IO.Pipelines.Pipe.DefaultPipeReader.GetResult(Int16 token) at System.Threading.Tasks.ValueTask1.get_Result() at Microsoft.AspNetCore.SignalR.Client.HubConnection.d81.MoveNext() System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult) at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult) --- End of inner exception stack trace --- at System.Net.Security._SslStream.EndRead(IAsyncResult asyncResult) at System.Net.TlsStream.EndRead(IAsyncResult asyncResult) at System.Net.PooledStream.EndRead(IAsyncResult asyncResult) at System.IO.Stream.<>c.b__43_1(Stream stream, IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncTrimPromise1.Complete(TInstance thisRef, Func`3 endMethod, IAsyncResult asyncResult, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.WebSockets.WebSocketConnectionStream.d21.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task) at System.Net.WebSockets.WebSocketBase.WebSocketOperation.d19.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.WebSockets.WebSocketBase.d__45.MoveNext() System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult) at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)

2023-10-09T08:16:20.8985093Z [Warn]: Connection started reconnecting due to an error System.Net.WebSockets.WebSocketException (0x80004005): An internal WebSocket error occurred. Please see the innerException, if present, for more details. ---> System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult) at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult) --- End of inner exception stack trace --- at System.Net.Security._SslStream.EndRead(IAsyncResult asyncResult) at System.Net.TlsStream.EndRead(IAsyncResult asyncResult) at System.Net.PooledStream.EndRead(IAsyncResult asyncResult) at System.IO.Stream.<>c.b43_1(Stream stream, IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncTrimPromise1.Complete(TInstance thisRef, Func3 endMethod, IAsyncResult asyncResult, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.WebSockets.WebSocketConnectionStream.<ReadAsync>d__21.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task) at System.Net.WebSockets.WebSocketBase.WebSocketOperation.<Process>d__19.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.WebSockets.WebSocketBase.<ReceiveAsyncCore>d__45.MoveNext() at System.Net.WebSockets.WebSocketBase.ThrowIfConvertibleException(String methodName, Exception exception, CancellationToken cancellationToken, Boolean aborted) at System.Net.WebSockets.WebSocketBase.<ReceiveAsyncCore>d__45.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport.<StartReceiving>d__22.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result) at System.IO.Pipelines.Pipe.GetReadAsyncResult() at System.IO.Pipelines.Pipe.DefaultPipeReader.GetResult(Int16 token) at System.Threading.Tasks.ValueTask1.get_Result() at Microsoft.AspNetCore.SignalR.Client.HubConnection.d81.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.SignalR.Client.HubConnection.d69.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.SignalR.Client.HubConnection.d56.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.SignalR.Client.HubConnectionExtensions.d43`1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at AmphiSys.Signalr.Client.ClientController.d22.MoveNext() System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult) at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult) --- End of inner exception stack trace --- at System.Net.Security._SslStream.EndRead(IAsyncResult asyncResult) at System.Net.TlsStream.EndRead(IAsyncResult asyncResult) at System.Net.PooledStream.EndRead(IAsyncResult asyncResult) at System.IO.Stream.<>c.b43_1(Stream stream, IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncTrimPromise1.Complete(TInstance thisRef, Func`3 endMethod, IAsyncResult asyncResult, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.WebSockets.WebSocketConnectionStream.d21.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task) at System.Net.WebSockets.WebSocketBase.WebSocketOperation.d19.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.WebSockets.WebSocketBase.d__45.MoveNext() System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult) at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult) 2023-10-09T08:16:20.8985093Z [Error]: SendMessageToServerAsync exception System.Net.WebSockets.WebSocketException (0x80004005): An internal WebSocket error occurred. Please see the innerException, if present, for more details. ---> System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult) at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult) --- End of inner exception stack trace --- at System.Net.Security._SslStream.EndRead(IAsyncResult asyncResult) at System.Net.TlsStream.EndRead(IAsyncResult asyncResult) at System.Net.PooledStream.EndRead(IAsyncResult asyncResult) at System.IO.Stream.<>c.b43_1(Stream stream, IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncTrimPromise1.Complete(TInstance thisRef, Func3 endMethod, IAsyncResult asyncResult, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.WebSockets.WebSocketConnectionStream.<ReadAsync>d__21.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task) at System.Net.WebSockets.WebSocketBase.WebSocketOperation.<Process>d__19.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.WebSockets.WebSocketBase.<ReceiveAsyncCore>d__45.MoveNext() at System.Net.WebSockets.WebSocketBase.ThrowIfConvertibleException(String methodName, Exception exception, CancellationToken cancellationToken, Boolean aborted) at System.Net.WebSockets.WebSocketBase.<ReceiveAsyncCore>d__45.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport.<StartReceiving>d__22.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result) at System.IO.Pipelines.Pipe.GetReadAsyncResult() at System.IO.Pipelines.Pipe.DefaultPipeReader.GetResult(Int16 token) at System.Threading.Tasks.ValueTask1.get_Result() at Microsoft.AspNetCore.SignalR.Client.HubConnection.d81.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.SignalR.Client.HubConnection.d69.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.SignalR.Client.HubConnection.d56.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.SignalR.Client.HubConnectionExtensions.d431.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at AmphiSys.Signalr.Client.ClientController.<InternalSendMessageToServerAsync>d__22.MoveNext() System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult) at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult) --- End of inner exception stack trace --- at System.Net.Security._SslStream.EndRead(IAsyncResult asyncResult) at System.Net.TlsStream.EndRead(IAsyncResult asyncResult) at System.Net.PooledStream.EndRead(IAsyncResult asyncResult) at System.IO.Stream.<>c.<BeginEndReadAsync>b__43_1(Stream stream, IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncTrimPromise1.Complete(TInstance thisRef, Func3 endMethod, IAsyncResult asyncResult, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.WebSockets.WebSocketConnectionStream.d21.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task) at System.Net.WebSockets.WebSocketBase.WebSocketOperation.d19.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.WebSockets.WebSocketBase.d__45.MoveNext() System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult) at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)

.NET Version

No response

Anything else?

We are using .net 7 (7.0.10) on the server. The clients are running WPF framework 4.7.2 using Microsoft.AspNetCore.SignalR.Client 7.0.2

BrennanConroy commented 1 year ago

after a client has lost connection to the server and reconnects it suddenly can't acknowledges messages from server and can't send messages to the server.

The logs you posted don't show the client as being reconnected. It looks like it's unable to connect which implies a networking issue (like not being connected to the internet, either server or client).

Even if you restart the Client it still can't send or acknowledge messages.

Can you show logs that this is happening? I would be extremely surprised if the client could connect but not send or receive messages.

What do the server logs show when the client is in this state?

dannymelsen commented 1 year ago

So it's been possible to reproduce the issue now. I am running the client in a virtual box windows 10 the application is wpf .net framework 4.7.2 We are using websocket as transportType the server is running as windows service on a windows 11. using net 7 (7.0.10)

Repro steps: Disable Ethernet adapter Send a Msg while "reconnecting" (Our client will first send msg when connectionState = Connected). Enable Ethernet adapter Send a Msg when it's connected agian. Disable Ethernet adapter as soon as the client has reconnected Enable Ethernet adapter Now the client unable to sent message to server and it can't acknowledge messages from server. Sometimes you have repeat theese steps. to reproduce this issue.

log files: client.log server.log

Implementations of client, server, serverhub and connectionMapping

ServerHub.txt ClientController.txt ConnectionMapping.txt ServerController.txt

we might think its this line of code in the ServerController in the method RemoveDisconnectedUserWithConnectionIdAsync, but cant figure out why. if (!_connections.Remove(userId, connectionId)) { _logger.LogInformation("Client ID {ClientId} already has a new connection ID. Disconnection not performed. Client tried to disconnect with old connection ID {ConnectionId}", userId, connectionId); return; }

dannymelsen commented 1 year ago

I have some addiontial logs where i have added log for RaiseTransportMessageReceivedAndSendResponseAsync on server and ClientController_TransportMessageReceivedAsync on client logs: ServerLog2.log ClientLog2.log

BrennanConroy commented 1 year ago

Have you changed the server side ClientTimeoutInterval? It sort of looks like your connection is half open, the client looks like it's receiving messages but unable to send anything, and since you're using ClientResults (server expects a response to the sends) then due to the client not being able to send the server keeps timing out and retrying to send the message.

What should happen is the server won't receive any pings from the client and close the connection (after 30 seconds by default).

Enabling and sharing trace logs in SignalR would be ideal to show the client trying to send pings and the server never receiving them https://learn.microsoft.com/aspnet/core/signalr/diagnostics?view=aspnetcore-7.0

Also, sharing a github repo with the code would be a lot more useful than separate .txt files.

dannymelsen commented 1 year ago

The only thing we have changed on Signalr is: MaximumParallelInvocationsPerClient, MaximumReceiveMessageSize and EnableDetailedErrors all other propeties are default settings.

But what it seems like is it's a timing issue when sending a message right after connected and then disconnecting the client as soon as it sends the message. Even if i restart the client (this creates a new hubConnection) it can't send message

i have enabled trace logs and reproduced the issue.

logs: client2.log server2.log

Sharing the repo is a bit of a problem because it is integrated into our software. If the logs aint enough i can try to create a small isolated solution that can reproduce the issue i can share with you.

BrennanConroy commented 1 year ago

Looks like you enabled debug logs, could you get trace logs? And on the client too? Also, when is the ethernet being reconnected/disconnected in relation to the logs would help me understand the flow.

dannymelsen commented 1 year ago

here is the new logs with telemetry from when the adapter is enabled/disabled:

client3.log server3.log Telemetry.txt

dannymelsen commented 1 year ago

I can see the client wasn't set to trace i have new logs where client is on trace aswell. It seems like the issue is somekind of timing when recieving / sending message from client when it's reconnecting. so if you see the last reconnect after that it can't acknowlegde message from server and it can't send messages to the server.

if i restart the the server everythings works fine agian. But restarting the client doesn't work.

Logs Updated with new logs

client5.log server5.log Telemetry5.txt

dannymelsen commented 1 year ago

We have found a consistent way to reproduce the error. It only occurs when a client successfully sends a message to the server, and then the server unsuccessfully sends a message to the client. Here are the reproduction steps:

  1. Ensure the client is connected to the server.
  2. Send a message from the client to the server and ensure it has been received successfully.
  3. Send a message from the server to the client, but before sending the actual message, disconnect the client from the server (e.g., by disabling the network adapter). Here is the line of code we use to send the message from the server to the client: await _hubContext.Clients.Client(connectionIdByUserId).InvokeAsync<bool>("TransportMessageReceived", message, cts.Token).ConfigureAwait(continueOnCapturedContext: false); (Note: our server implementation repeatedly retries to send the message to the client, as long as SignalR informs us that a client is connected.)
  4. Reconnect the client to the server (e.g., by enabling the network adapter) and observe that:
    • Messages from the client are no longer received by the server.
    • Messages from the server are no longer acknowledged by the client.

We can fairly easily reproduce the issue through Visual Studio by:

  1. Placing a breakpoint right before the message is sent from the server to the client: await _hubContext.Clients.Client(connectionIdByUserId).InvokeAsync<bool>("TransportMessageReceived", message, cts.Token).ConfigureAwait(continueOnCapturedContext: false);
  2. Then disconnect the client.
  3. Resume from the breakpoint.
  4. Reconnect the client and observe the aforementioned issue.

It seems that the server somehow becomes stuck because we are calling InvokeAsync() right after the client has disconnected.

Bulgrak commented 1 year ago

@BrennanConroy Hello, I am a colleague of @dannymelsen. I've set up a repository that illustrates the problem we're facing: https://github.com/Bulgrak/SignalRNoReconnectDemo. Please feel free to reach out if you have any questions or require further clarification about the repository.

BrennanConroy commented 1 year ago

Your code in ConnectionMapping looks buggy, try fixing that and see if you're still having issues.

public void Add(string key, string newConnectionId)
{
    _connectionIdByUserId[key] = newConnectionId;
}

public bool Remove(string key, string connectionIdToRemove)
{
    if (!_connectionIdByUserId.TryGetValue(key, out string connectionId) || connectionIdToRemove != connectionId)
    {
        return false;
    }

    // this is a race condition, you could remove a connectionId right after Add(...) has been called.
    return _connectionIdByUserId.TryRemove(key, out _);
}
Bulgrak commented 1 year ago

Thanks for your advice, we'll try to look more into the implementation of ConnectionMapping. However, in the minimal repo, ConnectionMapping.Remove(...) is never called when running in debug mode (correct me if I'm wrong, but when a debugger is attached, OnDisconnectedAsync is never called, which is as designed to make debugging easier) With that in mind, what could then be the cause of this behaviour we are seeing in the minimal repo?

BrennanConroy commented 1 year ago

I just noticed this code and comment: https://github.com/Bulgrak/SignalRNoReconnectDemo/blob/98e3720391ff07655eae1449654e7966fb2d2f64/src/SignalRNoReconnectDemoServer/Worker.cs#L52C1-L53

Pretty sure your issue is because you're blocking the Hub.OnConnectedAsync method indefinitely if there is an active send to a client, because you hold onto the semaphore forever while trying to send the message. When you block Hub.OnConnectedAsync there are no messages being read and processed from the connection, so while the logs show the WebSocket getting bytes, SignalR isn't processing the data because you've blocked the connection method.

While this investigation points to a bug in your code, it does make me want to add some additional logging in SignalR.

ghost commented 1 year ago

Hi @dannymelsen. We have added the "Needs: Author Feedback" label to this issue, which indicates that we have an open question for you before we can take further action. This issue will be closed automatically in 7 days if we do not hear back from you by then - please feel free to re-open it if you come back to this issue after that time.

Bulgrak commented 12 months ago

You're absolutely right! After reading your explanation I went through the code, and I can confirm that the semaphore is the cause of our problem. We somehow missed that if there is an active send to a client, the 'Hub.OnConnectedAsync' method is blocked indefinitely. Thanks a lot for your quick help and insight.

ghost commented 12 months ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. If it is closed, feel free to comment when you are able to provide the additional information and we will re-investigate.

See our Issue Management Policies for more information.