aspnet / SignalR

[Archived] Incredibly simple real-time web for ASP.NET Core. Project moved to https://github.com/aspnet/AspNetCore
Apache License 2.0
2.38k stars 446 forks source link

InvalidOperationException on SendAsync after Closed with WebSocketException #3298

Closed gavinjensen closed 5 years ago

gavinjensen commented 6 years ago

The SignarR client HubConnection is getting stuck in a disconnected state after being closed and restarted and is unable to send any more messages. This occurs after a long connection time of 4-8 hours.

Scenario After running for an extended period of time successfully sending messages the client gets a Closed call with the following WebSocket exception.

WebSocket Exception on Closed Connection Closed with Exception :System.Net.WebSockets.WebSocketException (0x80004005): The remote party closed the WebSocket connection without completing the close handshake. at System.Net.WebSockets.WebSocketBase.WebSocketOperation.<Process>d__19.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.WebSockets.WebSocketBase.<ReceiveAsyncCore>d__45.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.Http.Connections.Client.Internal.WebSocketsTransport.<StartReceiving>d__19.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.IO.Pipelines.PipeCompletion.ThrowLatchedException() at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result) at System.IO.Pipelines.Pipe.GetReadAsyncResult() at Microsoft.AspNetCore.SignalR.Client.HubConnection.<ReceiveLoop>d__46.MoveNext(). Attempting to open a new connection for SendURL: myservice.service.signalr.net:5001/client/?hub=myhub

Then after successfully calling StartAsync on the same HubConnection to reactivate the connection some new messages arrive and when SendAsync is called to send them it thows the following exception.

System.AggregateException: One or more errors occurred. ---> System.InvalidOperationException: The 'SendCoreAsync' method cannot be called if the connection is not active at Microsoft.AspNetCore.SignalR.Client.HubConnection.CheckConnectionActive(String methodName) at Microsoft.AspNetCore.SignalR.Client.HubConnection.<SendCoreAsyncCore>d__39.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.AspNetCore.SignalR.Client.HubConnection.<SendCoreAsync>d__30.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

We have observer other Closed callbacks that successfully reconnect and allow messsages to be sent again but after each of these the HubConnection is in a bad state and unable to resend. The solution has been to recreate the HubConnection on each Closed which should not be required.

Client Code Here is the psudo client code that attempts to reconnect. It follows the same pattern as the sample where it reconnects in the Closed and reuses the same HubConnection instance.

` public class MessageSender { HubConnection HubConnection {get; set;} public RelayMessageSender() { HubConnection = CreateHubConnection().Wait(); }

private async Task<HubConnection> CreateHubConnection()
{
    HubConnection hubConnection = new HubConnectionBuilder()
        .WithUrl(this.SendUrl, option =>
        {
            option.AccessTokenProvider = () =>
            {
                return Task.FromResult(SignalRTokenCache.GetOrAdd(
                    this.SendUrl, (name, value) => AzureSignalRTokenUtil.GenerateAccessToken(this.SendUrl))
                );
            };
        }).Build();
    hubConnection.Closed += HubConnectionOnClosed;

    await hubConnection.StartAsync();
    return hubConnection;
}

private async Task HubConnectionOnClosed(Exception exception)
{
    await Task.Delay(_random.Next((int)_minBackOff.TotalMilliseconds, (int)_maxBackOff.TotalMilliseconds));
    // ** This Fixes the issue but should not be required
    // HubConnection = await CreateHubConnection();
    await HubConnection.StartAsync();
}

public async Task SendMessage(PayloadMessage payloadMessage)
{
    await HubConnection.SendAsync(payloadMessage.Target, payloadMessage.SendMessagesRequest);
}

} `

Environment: Running the server in Azure Service Fabric using nuget package Microsoft.Azure.SignalR 1.0.0 which uses Microsoft.AspNetCore.SignalR 1.0.4 Client is C# .net using Microsoft.AspNetCore.SignalR 1.0.4

analogrelay commented 6 years ago

The solution has been to recreate the HubConnection on each Closed which should not be required.

This is absolutely correct. It sounds like possibly some client state is getting corrupted. If recreating the HubConnection completely fixes the issue, I'd definitely recommend doing that as a workaround for now, the "cost" of doing that (in terms of memory/performance/etc.) is no different. You are correct that it shouldn't be necessary though, so we should try to get to the bottom of that.

Would you be able to capture some client-side logs for the client when it enters this state? Ideally we'd want logs spanning from when the Closed callback fires, all the way up to when the SendAsync fails after restarting. Similarly, a process memory dump after SendAsync has failed might allow us to see what state the connection is in.

gavinjensen commented 6 years ago

That was my concern as well. The odd thing is that we saw other closed callbacks with different exceptions where the StartAsync worked and message sending resumed. We have a workaround in place now but could redeploy the incorrect code to a test environment and try to turn on logging for you to capture more of the state changes for you to debug the problem. I'll take a look and get back to you.

muratg commented 5 years ago

@gavinjensen closing this for now. If you're able to get more information, please let us know and we'll reactivate.