SocketCluster / socketcluster

Highly scalable realtime pub/sub and RPC framework
https://socketcluster.io
MIT License
6.15k stars 314 forks source link

Socket connection gets timed out unexpectedly #580

Open psk001 opened 1 year ago

psk001 commented 1 year ago

I am using socketcluster-server on server side. On client side I use simple websocket to connect to server. My connection parameters are-

let agOptions = { handshakeTimeout: 60000, pingInterval: 50000, pingTimeout: 90000 }; As seen in agOptions, pingTImeout is 90 seconds, which means that the connection should be alive even if there is no communication between client and server for 90s. On client side, I have a ping mechanism which sends ping at interval of 10 seconds in the format mentioned below. { "event":"#publish", "data":{ "channel":"some-channel", "data":{ "socketId":"", "event":"ping", "auth":"" } } } The server responds to the ping by sending a pong in the below : { "event": "#publish", "data": { "channel": "some-other-channel-subscribed-by-client", "data": { "event": "pong", "message": "#2" } } }

I expect this mechanism to keep connection alive as the client and server and client communicate continuously. This things works very well on localhost. But when this is deployed on a kubernetes cluster, it gets disconnected after random intervals. Sometimes 40s other time 100s. How do i fix it?

jondubois commented 1 year ago

Check that your ingress load balancer supports WebSockets and doesn't timeout long lived connections. Some cloud providers timeout connections after a few seconds by default. You may need to increase it.

psk001 commented 1 year ago

Thanks for the quick response @jondubois. I really appreciate it. But even if i disable the ping time out, that is let agOptions = { pingTimeoutDisabled: true }; It still gets disconnected. The other thing is, its quite inconsistent. For some clients its 40s for others its 80-90s. Also it's not an issue from the load balancer side as the sticky session time out has been set to more than 1000 seconds. Please suggest the right configuration for the load balancer if required.

jondubois commented 1 year ago

Another possibility is that if your client and server major versions are different (especially those before v14 versus v15+), it can lead to timeout issues as the timeout message is different. In the older versions, the protocol used #1 and #2 messages for ping and pong but in newer versions it uses empty strings (to save bandwidth).

If you want to use new clients with old servers, you can run the latest client in compatibility mode. See https://github.com/SocketCluster/socketcluster#compatibility-mode