ably / ably-js

Javascript, Node, Typescript, React, React Native client library SDK for Ably realtime messaging service
https://ably.com/download
Apache License 2.0
310 stars 55 forks source link

Connection recovery not working #1844

Closed michalzaq12 closed 5 days ago

michalzaq12 commented 1 month ago

Ably ver: 2.3.1

Description Realtime connection (websocket, only websocket transport provided) is not restored after network connection is lost. SDK tries to connect without success (network is already available).

My observation

  1. When a network problem occurs, checkWsConnectivity() is called, which of course fails (that's fine).
  2. checkWsConnectivity() can by called only once (WHY?) , so wsCheckResult is always false https://github.com/ably/ably-js/blob/8ba8ec29f521b18ffd16bc4fcb33004c64a7d706/src/common/lib/transport/connectionmanager.ts#L1104
  3. tryTransportWithFallbacks() is called when trying to reconnect
  4. shouldContinue() in tryTransportWithFallbacks() body always returns false, which call transport.dispose()

┆Issue is synchronized with this Jira Task by Unito

sacOO7 commented 1 month ago

@michalzaq12 reconnection works in following manner

  1. Once client is disconnected, client checks if fallbacks can be used. Internet check is part of the same.
  2. When internet is available, all fallbacks ( 5 default ) are tried. If none of them succeed, connection goes into disconnected state again
  3. New connection is retried again after disconnectedRetryTimeout ( with fallbacks retried ), value for disconnectedRetryTimeout is 15 seconds.
sacOO7 commented 1 month ago

@michalzaq12 Can you check confirm if connection recovery works after 15 seconds ? Ideally, it should work as long as connection is in either disconnected or suspended state. Only when connection goes into closed state, connection is not retried.

michalzaq12 commented 1 month ago

Once client is disconnected, client checks if fallbacks can be used. Internet check is part of the same. When internet is available, all fallbacks ( 5 default ) are tried. If none of them succeed, connection goes into disconnected state again New connection is retried again after disconnectedRetryTimeout, value is 15 seconds.

I know exactly how it works. It doesn't change the fact that there's some bug in the implementation (regression from v1).

Can you check confirm if connection recovery works after 15 seconds ?

After 15 seconds, the SDK tries to reconnect but fails every time.

michalzaq12 commented 1 month ago

Is there any template for minimal reproducible repo?

sacOO7 commented 1 month ago

@michalzaq12 there's no template as such. It will be great if you can either post code here or best if create a separate repository with steps to reproduce the bug. That way, we can reproduce it given environment i.e. nodejs, browser etc

michalzaq12 commented 4 weeks ago

Environment: Windows 11, Chrome 127

Reproduction code (ably-js 2.3.1):

var client = new Ably.Realtime({
  key: '<ABLY_KEY>',
  transports: ['web_socket']
});

setInterval(() => {
  console.info('STATE: ' +  client.connection.state)
}, 5000)

Reproduction steps:

  1. Wait for connected state
  2. Disable WiFi (do not use 'offline' simulation from dev network tab)
  3. Wait for WebSocket connection to 'wss://ws-up.ably-realtime.com/' failed log
  4. Enable WiFi
  5. SDK tries to reconnect but fails every time

image

VeskeR commented 3 weeks ago

Hi @michalzaq12 !

Thank you for spotting the issue and providing detailed explanations and steps to reproduce it. I wasn't able to reproduce the issue locally as, in my case, the internet-up check completes before the startWebSocketSlowTimer timer performs its logic with checkWsConnectivity and sets this.wsCheckResult = false. However, I can see the race condition problem in the code, and artificially slowing down the resolution of the internet-up check produces the situation you're describing: shouldContinue now always returns false, and the client never reconnects. This is indeed a bug, and we will fix it as soon as possible.

As a side note, is there something unusual about your network setup? For example, do you have a VPN program installed that might be keeping the request to https://internet-up.ably-realtime.com open longer than usual without internet access, which then causes the race condition on your machine?

michalzaq12 commented 3 weeks ago

Default network settings. I don't use a VPN.

justerror commented 1 week ago

Some of our users have encountered this reconnection issue after waking up PC. It seems that the problem is that the internet does not become available immediately after PC connecting to Wi-Fi or LAN.

I was able to reproduce this on macOS. To do so, I installed the Network Link Conditioner from Apple (https://developer.apple.com/download/more/?q=Additional%20Tools), which is part of the Additional Tools for Xcode package.

Steps:

  1. Disconnect from the network.
  2. Enable the Network Link Conditioner with a setting of 100% loss. You can open terminal window with ping some internet resource for monitoring, e.g. PING 8.8.8.8).
  3. Reconnect to the network.
  4. Wait for more than 4 seconds.
  5. Disable the Network Link Conditioner.
  6. The internet connection returns, but ably fails to reconnect, resulting in a loop of disconnecting -> connecting -> disconnecting -> …

As mentioned earlier, if the startWebSocketSlowTimer https://github.com/ably/ably-js/blob/bd6629336f97dff51b407e9ea342b7b390c625dc/src/common/lib/util/defaults.ts#L86 timeout triggers, the connection will not be restored due race condition problem.

Workaround

var client = new Ably.Realtime({
 ...
});

client.options.timeouts.webSocketSlowTimeout = 0; // <--- workaround
michalzaq12 commented 1 week ago

@VeskeR When will the bug be fixed?

VeskeR commented 1 week ago

Hi @michalzaq12 , fixing this will be our priority next week and I'm going to work on it on Monday and hopefully release a fix early next week.

Gid733 commented 6 days ago

Hello @VeskeR , same issue on the React Native, I'm not sure this workaround would work here. The problem happens when the internet connection is not quite good (especially on bad public wifi), and the app goes into an infinite loop of Disconnected -> Reconnecting -> Disconnected.

VeskeR commented 5 days ago

I was able to fix the issue locally and right now adding some additional tests for the future. We aim to review the PR and release a patch version on npm today.

VeskeR commented 5 days ago

Websocket reconnection issue has been fixed in ably-js 2.3.2 release.

@Gid733 The problem should be fixed on all platforms. Please update to ably-js 2.3.2 version and see if that fixes it in your case. If you're still experiencing reconnection issues with 2.3.2 release, please let us know.