dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.44k stars 10.02k forks source link

SignalR JavaScript client sometimes freeze at Connecting or Disconnecting state unexpectedly #58260

Open tomfong opened 4 weeks ago

tomfong commented 4 weeks ago

Is there an existing issue for this?

Describe the bug

When calling hubConnection.start(), sometimes the client freezes at Connecting state, and when calling hubConnection.stop(), sometimes the client freezes at Disconnecting state. Unless the HubConnection instance is disposed and rebuilt.

Expected Behavior

Calling hubConnection.start() should let the client become Connected state as soon as possible, while calling hubConnection.stop() should let the client become Disconnected state as soon as possible.

Steps To Reproduce

import { HubConnection, HubConnectionBuilder, HubConnectionState, LogLevel } from "@microsoft/signalr";

var hubConnection: HubConnection;

hubConnection = new HubConnectionBuilder()
                                        .withUrl("<server endpoint>")
                                        .configureLogging(LogLevel.Information)
                                        .withServerTimeout(30000)
                                        .withKeepAliveInterval(15000)
                                        .withStatefulReconnect()
                                        .build();

await hubConnection.start();

function checkIfHubConnected(): boolean {
    return hubConnection?.state == HubConnectionState.Connected ?? false;
}

function hubConnectionState() {
    return hubConnection?.state;
}

Exceptions (if any)

No response

.NET Version

8.0.7

Anything else?

package.json

{
   "dependencies": {
      "@microsoft/signalr": "^8.0.7"
   }
}
BrennanConroy commented 4 weeks ago

What do you mean it "freezes". If the client can't connect to the server it might take longer than usual for the HTTP request to fail.

Can you show client and server logs when it "freezes"?

tomfong commented 3 weeks ago

@BrennanConroy The hub connection sometimes disconnect due to network issue. When I try to manually reconnect the hub by the following code

async function manualReconnect() {
  console.log(`Calling manualReconnect()...`);

  if (hubConnection == null) {
    console.log(`connection state [null], skip starting reconnection`);
    return;
  }

  if (hubConnection.state == HubConnectionState.Connected) {
    console.log(`connection state [Connected], skip starting reconnection`);
    return;
  }

  if (hubConnection.state == HubConnectionState.Reconnecting) {
    console.log(`connection state [Reconnecting], skip starting reconnection`);
    return;
  }

  if (hubConnection.state == HubConnectionState.Disconnecting) {
    console.log(`connection state [Disconnecting], skip starting reconnection`);
    return;
  }

  if (hubConnection.state == HubConnectionState.Disconnected || hubConnection.state == HubConnectionState.Connecting) {
    if (hubConnection.state == HubConnectionState.Connecting) {
      console.log(`connection state [Connecting], stopping connection...`);
      try {
        await hubConnection.stop()
          .then(_ => {
            console.log(`Connection stopped`);
          })
          .catch(err => {
            console.log(`Failed to stop connection, err: ${JSON.stringify(err)}`);
          });
      } catch (e) {
        console.log(`Failed to stop connection, e: ${JSON.stringify(e)}`);
      }
    } else {
      console.log(`connection state [Disconnected]`);
    }

    console.log(`Starting reconnection...`);

    // Start reconnection
    try {
      await hubConnection.start()
        .then(_ => {
          console.log(`reconnected!`);
          return;
        })
        .catch((err) => {
          console.error(`starting connection occurred error, err: ${JSON.stringify(err)}`);
          return;
        });
    } catch (e) {
      console.error(`starting connection occurred error, e: ${JSON.stringify(e)}`);
      return;
    }
  }
}

The log of the first attempt:

Calling manualReconnect()...
connection state [Disconnected]
Starting reconnection...

and then nothing else.

After around 30 minutes, I call manualReconnect() again and the log shows:

Calling manualReconnect()...
connection state [Connecting], stopping connection...

and then nothing else.

After several minutes, I call manualReconnect() again and the log shows:

Calling manualReconnect()...
connection state [Disconnecting], skip starting reconnection

And after another several minutes, I call manualReconnect() again and the log shows:

Calling manualReconnect()...
connection state [Disconnecting], skip starting reconnection

Therefore, I determine the SignalR client "freezing" at Connecting and Disconnecting states due to unknown reason.

BrennanConroy commented 3 weeks ago

Can you repro this consistently?

Please turn on logging (trace is best), also what do the network traces look like?

tomfong commented 3 weeks ago

@BrennanConroy It is running on NodeJS app. How to access the log and trace?

BrennanConroy commented 2 weeks ago

https://learn.microsoft.com/aspnet/core/signalr/diagnostics?view=aspnetcore-8.0#javascript-client-logging

https://learn.microsoft.com/aspnet/core/signalr/diagnostics?view=aspnetcore-8.0#network-traces

tomfong commented 2 weeks ago

@BrennanConroy After following the guide, I've captured some logs that seem to be useful:

Here was the log when I noticed the hub suddenly disconnected

SignalR Logger: [Debug] HttpConnection.stopConnection(undefined) called while in state Disconnecting.
SignalR Logger: [Error] Connection disconnected with error 'Error: Server timeout elapsed without receiving a message from the server.'.
SignalR Logger: [Debug] HubConnection.connectionClosed(Error: Server timeout elapsed without receiving a message from the server.) called while in state Connected.

The log when I tried to manually reconnect the hub

Calling manualReconnect()...
connection state [Disconnected]
Starting reconnection...
SignalR Logger: [Debug] Starting HubConnection.
SignalR Logger: [Debug] Starting connection with transfer format 'Text'.
SignalR Logger: [Debug] Sending negotiation request: https://<Hub Endpoint, Hidden>/negotiate?negotiateVersion=1&useStatefulReconnect=true.
SignalR Logger: [Debug] Selecting transport 'WebSockets'.

and then the client looked like "freezed".

Therefore, after several minutes, I tried to manually reconnect the hub once more by calling manualReconnect() and then the log:

Calling manualReconnect()...
connection state [Connecting], stopping connection...
SignalR Logger: [Debug] Stopping HubConnection.

and no more logs else.

BrennanConroy commented 2 weeks ago

Could be https://github.com/websockets/ws/issues/1948. Since you're on node you're using the "ws" npm package and it doesn't have a default handshake timeout.

You can test this by injecting a custom websocket constructor:

connection = new HubConnectionBuilder().withUrl("url", {
    WebSocket = MyWS
}).build();

class MyWS extends WebSocket {
    constructor(url, protocols, options) {
        super(url, protocols, { ...options, handshakeTimeout : 10000 });
    }
}
tomfong commented 2 weeks ago

@BrennanConroy This error occurred: Object literal may only specify known properties, and 'WebSocket' does not exist in type 'IHttpConnectionOptions', with code

 this._hubConnection = new HubConnectionBuilder()
                                  .withUrl(process.env.SIGNALR_ENDPOINT, { WebSocket: CustomWebSocket })
                                  .configureLogging(new CustomSignalRLogger())
                                  .withServerTimeout(30000)
                                  .withKeepAliveInterval(15000)
                                  .withStatefulReconnect()
                                  .build();
BrennanConroy commented 2 weeks ago
IHttpConnectionOptions options;
(options as any).WebSocket = CustomWebSocket;
new HubConnectionBuilder()
   .withUrl(..., options)
   .build();
tomfong commented 1 week ago

@BrennanConroy I've followed your suggestion and implemented related code. This time the log is quite different:

SignalR Logger:  [Debug] Starting HubConnection.
SignalR Logger:  [Debug] Starting connection with transfer format 'Text'.
SignalR Logger:  [Debug] Sending negotiation request: https://<ENDPOINT>/hub/negotiate?negotiateVersion=1&useStatefulReconnect=true.
SignalR Logger:  [Debug] Selecting transport 'WebSockets'.
SignalR Logger:  [Information] (WebSockets transport) There was an error with the transport.
SignalR Logger:  [Error] Failed to start the transport 'WebSockets': Error: WebSocket failed to connect. The connection could not be found on the server, either the endpoint may not be a SignalR endpoint, the connection ID is not present on the server, or there is a proxy blocking WebSockets. If you have multiple servers check that sticky sessions are enabled.
SignalR Logger:  [Debug] Selecting transport 'ServerSentEvents'.
SignalR Logger:  [Debug] Sending negotiation request: https://<ENDPOINT>/hub/negotiate?negotiateVersion=1&useStatefulReconnect=true.

Then no more logs else.

BrennanConroy commented 1 week ago

Did you wait >100 seconds? We pass in a default timeout of 100 seconds to the negotiate call.

What version of node are you using?

tomfong commented 1 week ago

@BrennanConroy Yes, it always holds Connecting state more than 100 seconds.

The node version is 20.15.1

BrennanConroy commented 1 week ago

Try using node v21+. Version <20 uses the node-fetch npm library, whereas 21+ uses the native fetch from node. Would narrow down what might be wrong.

tomfong commented 1 week ago

@BrennanConroy But node-fetch is one of the dependencies of @microsoft/signalr 8.0.7

    "node_modules/@microsoft/signalr": {
      "version": "8.0.7",
      "resolved": "https://registry.npmjs.org/@microsoft/signalr/-/signalr-8.0.7.tgz",
      "integrity": "sha512-PHcdMv8v5hJlBkRHAuKG5trGViQEkPYee36LnJQx4xHOQ5LL4X0nEWIxOp5cCtZ7tu+30quz5V3k0b1YNuc6lw==",
      "dependencies": {
        "abort-controller": "^3.0.0",
        "eventsource": "^2.0.2",
        "fetch-cookie": "^2.0.3",
        "node-fetch": "^2.6.7",
        "ws": "^7.4.5"
      }
    }

How about I restrict the client only using WebSocket and disable SSE and Long Polling? Do you think it will help?

tomfong commented 4 days ago

Finally, my problem solved.

Here is my solution to ensure that starting connection or reconnection would not be stuck at Connecting state if any error occurred:

class CustomWebSocket extends WebSocket {
  constructor(
    address: string | URL,
    protocols?: string | string[],
    options?: WebSocket.ClientOptions | ClientRequestArgs,
  ) {
    super(address, protocols, { ...options, handshakeTimeout: 10000 });
  }
}

const options: IHttpConnectionOptions = {
  transport: HttpTransportType.WebSockets,
  skipNegotiation: true
};
(options as any).WebSocket = CustomWebSocket;

this._hubConnection = new HubConnectionBuilder()
        .withUrl(process.env.SIGNALR_ENDPOINT, options)
        .configureLogging(new CustomSignalRLogger())
        .withServerTimeout(30000)
        .withKeepAliveInterval(15000)
        .withStatefulReconnect()
        .build();

await this._hubConnection.start()

However, I do think the package should provide official way to set handshakeTimeout, as well as the timeout of Server-Sent Event.