anycable / anycable-client

AnyCable / Action Cable JavaScript client for web, Node.js & React Native
MIT License
97 stars 15 forks source link

Reconnect state issues #20

Closed ProTip closed 2 years ago

ProTip commented 2 years ago

Hola,

Thanks for this awesome library! We are testing this out with a subset of our users(ourselves) as a replacement to the official Action Cable library. The customizable retry strategy is clutch for us to provide more seamless reconnects during transient network events(such as wifi network swtiches and reconnects, etc).

Unfortunately we have been running into scenarios where it appears the connection/subscription states are becoming inconsistent resulting in permanent loss of channel subscription.

This happens often during our Heroku deployments. We are setup in a non-ideal configuration with anycable and anycable-go collocated in the same dynos, however these issues have been observed outside deployments as well.

I'm having trouble reproducing locally or discovering the precise issue through code review.. This is what I have witnessed though during deployments without Heroku "preboot" enabled(meaning the anycable dynos are all stopped before the new ones are brought up):

The first reconnected WS connection lasts for a minute and only one sub ack came through: image

After that was disconnected another conn was made however no subscription attempts are made: image

In the console this is the first error to appear: image

After a bunch of these errors are logged: image

My theory is that some combination of channel/subscription pending and/or intent are not being properly reset on disconnect/reconnect depending on their states going into those events.

palkan commented 2 years ago

Thanks for the report!

Can you fetch the corresponding anycable-go logs (ideally, in debug mode)?

Also, enabling client-side debug logs would be really helpful (createCable({logLevel: 'debug'}).

Btw, which version of anycable-go and anycable-client do you use?

palkan commented 2 years ago

I hit a similar problem ("Already pending subscribed") in production today (though no luck in reproducing locally). Will take a closer look ASAP.

ProTip commented 2 years ago

@palkan I might be able to do you one better.

Recently hit this while switching back-and-forth between resources in the FE which causes unsubscribes and re-subscribes. Always I see in the logs no subscription ack received in 500ms followed by subscription not found. So I went looking and found this:

This time out is setup during subscribe and will check back in 5s(default) to see if the subscription is still in pending. If so it deletes it: https://github.com/anycable/anycable-client/blob/3e9af95731f5ebaf1c8544afff2803eba8c2ae2e/packages/core/action_cable/index.js#L63

Unfortunately during this interval we may have unsubscribed and then re-subscribed to the same channel. In this case when the time out callback is called, and if the new subscription is still pending, it will find and delete the new pending subscription entry.

At this point when the new subscription ack comes in it will not find the pending entry and will bale out: https://github.com/anycable/anycable-client/blob/3e9af95731f5ebaf1c8544afff2803eba8c2ae2e/packages/core/action_cable/index.js#L140

From this point on the subscription seems to be busted. I see it in the hub subscriptions list with state connecting, and the intent changes based on if subscribe or unsubscribe is called on it, but it stays in the hub even if channels is empty and no further subscription attempts seem to get made.

palkan commented 2 years ago

Great! Thanks! That's helpful.

I'm currently working on a fix; hope to finish soon.

palkan commented 2 years ago

@ProTip The fixed version of protocol is available in master: the above bug is fixed, added handling of confirmations received after timeout (we now send unsubscribe in this case, for consistency).

I'm working now on some pending / intent before pushing a new release.

palkan commented 2 years ago

Fixed and released in 0.5.5 (this release only contains the protocol fix) and 0.5.6 (this release adds additional race condition fixes). Two releases to make it easier to identify potential regressions.

palkan commented 2 years ago

@ProTip Please, let me know if that fixes the problems you encountered.