Azure / azure-signalr

Azure SignalR Service SDK for .NET
https://aka.ms/signalr-service
MIT License
427 stars 101 forks source link

Connection failed: HubConnection.connectionClosed(Error: Server returned an error on close: Connection closed with an error.) called while in state Disconnecting. #1905

Open agustinsilvano opened 9 months ago

agustinsilvano commented 9 months ago

Describe the bug

Introduction

I'm encountering an issue with my application that leverages SignalR for server-to-client communication using WebSockets. The client is a Next.js application (Azure AppService), and the backend is an ASP.NET Core API 7.0 (Azure AppService). While everything works flawlessly in my local environment with a local API (including local WebSocket usage), the problem arises when integrating the Azure SignalR Service. The problem is also present when using my local environment but using Azure SignalR Service as a service to perform the WS connection.

Issue Description

The issue manifests as approximately 30% of the connections failing during the initial connection with the server. The negotiation with my API is successful, returning the URL of the Azure SignalR Service. Subsequently, this negotiation also succeeds, and the handshake is completed. However, the problematic connections receive the message Close message received from the server., leading to a subsequent error when attempting stopConnection(undefined).

When this error is present during the connection, it prevents the establishment of a connection to the server. Consequently, notifications emitted through SignalR are not sent to the "failed" client. Additionally, the retry policy is ineffective, as the reconnection message is not present. Even by commenting out my code (that performs the disconnect operation) does not avoid this disconnection issue which indicates that the disconnection is handled internally by SignalR. Attempts to manage the _connectionState property of the HubConnection guided by the error, particularly when the status is Disconnecting, have not yielded any positive changes.

Client logs

Configured the LogLevel to Debug while establishing the connection to the Hub, and got: Logs on failed connection: SignalRConnectionIssue

Logs on successful connection: SignalRConnectionOk

Here is a log file that includes both cases. The messages MESSAGE: GeneratingRecommendation and MESSAGE: GeneratedRecommendation are the notifications received on the client side through the hub. The message ###Connection started: undefined is a console.log inside of the then() method of the promise returned by the connection.start() method of the @microsoft/signalr module.

API logs

We have configured logging with Serilog and the log.txt file doesn't show any clue of the issue. Logs on successful connection:

2024-01-30 20:37:19.073 +00:00 [INF] Executing endpoint '/signalr-hubs/recommendation/negotiate'
2024-01-30 20:37:19.073 +00:00 [INF] Executed endpoint '/signalr-hubs/recommendation/negotiate'
2024-01-30 20:37:19.074 +00:00 [INF] Request finished HTTP/1.1 POST https://API/signalr-hubs/Recommendation/negotiate?negotiateVersion=1 - 0 - 200 3797 application/json 7.4527ms
2024-01-30 20:37:31.637 +00:00 [INF] Connection L7OCxdbSAe7aEMMQ8RNbZwnyohHwK02 started.

Logs on failed connection:

2024-01-30 20:04:47.701 +00:00 [INF] Executing endpoint '/signalr-hubs/recommendation/negotiate'
2024-01-30 20:04:47.701 +00:00 [INF] Executed endpoint '/signalr-hubs/recommendation/negotiate'
2024-01-30 20:04:47.702 +00:00 [INF] Request finished HTTP/1.1 POST https://API/signalr-hubs/Recommendation/negotiate?negotiateVersion=1 - 0 - 200 2650 application/json 2.6919ms

The difference is that the Connection started is present in one case and not in the other.

Azure Analytic logs

The last thing that I did was to enable the Diagnostic Logs on the Azure SignalR Service to get more context about the error and only can see two errors on the failed connections.

Couldn't get any clue of it.

Metrics from Diagnose and solve problems tool show ClosedByAppServer as a prominent metric: image

To Reproduce

Client Side

SocketService.js

import { HubConnectionBuilder, LogLevel } from '@microsoft/signalr';

let connections = {};

function createConnection(messageType) {
    const connectionObj = connections[messageType];
    if (!connectionObj) {
        const connection = new HubConnectionBuilder()
            .withUrl(`${process.env.NEXT_PUBLIC_API_SIGNALR_URL}${messageType}`,
                {
                    withCredentials: true,
                    accessTokenFactory: () => "access_token",
                })
            .configureLogging(LogLevel.Debug)
            .withAutomaticReconnect()
            .build();

        connections[messageType] = {
            type: messageType,
            connection: connection,
            started: false,
        };
        return connection;
    } else {
        return connections[messageType].connection;
    }

}

function startConnection(messageType) {
    const connectionObj = connections[messageType];
    if (!connectionObj.started) {
        connectionObj.connection
            .start()
            .then(x => {
                console.log("###Connection started:", x)
            })
            .catch(err => console.error('SOCKET: ', err.toString()));
        connectionObj.started = true;
    }
}

function stopConnection(messageType) {
    const connectionObj = connections[messageType];
    if (connectionObj?.connection?._connectionState != 'Disconnected' &&
    connectionObj?.connection?._connectionState != 'Disconnecting'  ) {
        connectionObj.connection.stop();
        connectionObj.started = false;
    }
}

function registerOnEvent(
    messageType,
    eventType,
    callback,
) {
    try {
        const connection = createConnection(messageType);
        connection.on(eventType, (arg1, arg2, arg3) => {
            console.table("MESSAGE:",eventType);
            callback(arg1, arg2, arg3);
        });
        connection.onclose(() => stopConnection(messageType));
        startConnection(messageType);
    } catch (error) {
        console.error('SOCKET: ', error);
    }
}

export const socketService = {
    registerOnEvent,
    stopConnection,
};

Usage

const RecommendationHub = "Recommendation";
const GeneratingRecommendationAction = "GeneratingRecommendation";
 socketService.registerOnEvent(
      RecommendationHub,
      GeneratingRecommendationAction,
      async (officeId, evaluationDate) => {
        //Some async code
        console.log("Event tirggered");
      }
 );

API Side

The SignalR registration on the ConfigureServices looks like:

  var signalR = context.Services.AddSignalR();

  if (!hostingEnvironment.IsDevelopment() ||
      configuration.GetValue<string>("Azure:SignalR:ConnectionString").StartsWith("Endpoint"))
  {
      //Register SignalR using Azure Service.
      signalR.AddAzureSignalR();
  }

The Hub code can not be published(but this is working ok using my local resources), but we are overriding the OnConnectedAsync:

Also, we are overriding the OnDisconnectedAsync but we are just

Further technical details

GersonDias commented 2 months ago

did you found something about this? I'm having a similar issue, but my app is hosted using AKS and the networking is done by application gateway.

vicancy commented 2 months ago

Try enable EnableDetailedErrors in your app server so that your client side could see the detailed exception thrown for debugging purpose.

In production, you could enable app server side log to see the detailed error thrown.

remcoros commented 3 weeks ago

We (@GersonDias and me) finally figured out our issue. It was a misconfigured client that set the ClientKeepAliveInterval to 30 seconds (so it sends a ping every 30 seconds), while the server had all default settings (ClientTimeoutInterval == 30 seconds), which means the client didn't send the pings often enough and the server killed the connection.

Hope that might help someone in the future finding this issue.