deepstreamIO / deepstream.io

deepstream.io server
https://deepstreamio.github.io
MIT License
7.13k stars 382 forks source link

Speed up the logout notification when client power cut off #1091

Closed msmarks closed 3 years ago

msmarks commented 3 years ago

In our project, Presence API is used to detect client connectivity. We set heartbeat interval to 60000ms on the js client library. If exit the client in a normal way, the presence API can immediately get a logout notification. However, if the client is powered off directly, We have to wait about 20 minutes before the presence API notifies the logout. We need configuration suggestions about when the client power down directly, how to get logout notification as soon as possible.

yasserf commented 3 years ago

What version of the server and client are you using?

yasserf commented 3 years ago

Also what type of http service are you using? uWS or ws? uWS deals with heartbeats (heartbeatinterval * 2). With the text protocol we use individual timers on each websocket. But going through the code is appears ws binary implementation is missing that. Can you confirm if thats what your using?

msmarks commented 3 years ago

Server Version: 5.1.3 Client Version: 5.1.10

httpServer:
  type: default
  options:
    # url path for http health-checks, GET requests to this path will return 200 if deepstream is alive
    healthCheckPath: /health-check
    # -- CORS --
    # if disabled, only requests with an 'Origin' header matching one specified under 'origins'
    # below will be permitted and the 'Access-Control-Allow-Credentials' response header will be
    # enabled
    allowAllOrigins: true
    # a list of allowed origins
    origins:
      - 'https://example.com'
    # Options required to create an ssl app
    # ssl:
    #   key: fileLoad(ssl/key.pem)
    #   cert: fileLoad(ssl/cert.pem)
    #   ca: ...
yasserf commented 3 years ago

Yeah that would probably be the issue then. I'll try get the fix in shortly.

yasserf commented 3 years ago

This should be now fixed in 5.1.6, in exchange to a timeout being setup for each socket. If performance is your ultimate endgame I would recommend trying to use uws, although it has it's own mini quirks (with our implementation at least).

TravisCI is super slow nowadays, so let me know if the release doesn't work for you as I need to switch context. Thanks!

yasserf commented 3 years ago

Ended up publishing the macos and windows myself.

msmarks commented 3 years ago

The heartbeat packet mechanism works. When I disconnect the network cable, I can quickly see the message of client disconnection in the log. However, it only outputs disconnection messages continuously. Other clients that subscribe to presence do not receive logout messages.

INCOMING_CONNECTION | from undefined (192.168.11.210)
AUTH_SUCCESSFUL | C-hl-ecd68a5c5603

!!!!! disconnect the network cable !!!!

heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
CLIENT_DISCONNECTED | C-hl-ecd68a5c5603
heartbeat issues
....... forever print "heartbeat issues"
yasserf commented 3 years ago

Yeah so it looks like I was calling close instead of destroy. Close does the side effect of a socket being closed (cleaning subscriptions and what not) but destroy just forces the actual socket closed (which should trigger that).

Anyways, I tagged 5.1.7 now.

Also, for anyone seeing this, the server side doesn't actually use a continuous PING/PONG heartbeat mechanism. It just makes sure a packet was received at least within the heartbeat period. This simplifies the code greatly. The server expects a message every heartbeatInterval * 2. If the server doesn't receive a message it terminates the connection. If it does receive a PING it responds with a PONG (but that doesn't go anywhere near heartbeat verification code).

      if (message.topic === TOPIC.CONNECTION && message.action === CONNECTION_ACTION.PING) {
        // respond to PING message
        socketWrapper.sendMessage({ topic: TOPIC.CONNECTION, action: CONNECTION_ACTION.PONG })
        continue
      }

On the client side its checks if a message wasn't received for a heartbeatInterval. If so it sends a ping. If it hasn't recieved any message within a heartbeatInterval * 2 it assumes it lost the connection to the server.

So you need to make sure the heartbeat intervals are the same. If development picks up again it would probably be good for the sever to send the heartbeat config to the client on an INITIALISE action on the connection which can update the client with the correct values (or at least verify they are correct). Since it's a stateless action (it doesn't have to happen) it's not a breaking change.

Mostly useful for production deployments where you don't want to roll out new deployments to play with heartbeats / use a separate config location.

Anyways, hope the fix works this time. Apologies, I don't have a running environment so I'm relying on tests and typescript (e2e and UT), which work really well at catching all sorts of edge cases but not so much at simulating connection loss.

msmarks commented 3 years ago

Thank you for your quick response. Could you help me compile a version of windows. It seems that the automatic compilation process failed.

yasserf commented 3 years ago

It's available on appveyor artefacts

https://ci.appveyor.com/project/yasserf/deepstream-io/builds/36504444/artifacts

msmarks commented 3 years ago

On the client side its checks if a message wasn't received for a heartbeatInterval. If so it sends a ping. If it hasn't recieved any message within a heartbeatInterval * 2 it assumes it lost the connection to the server.

If the client only subscribes to the change of server data, and the time interval of server data change is less than heartBeatInterval, the client will not send Ping, and the server will not receive any data from the client, and the client will be considered disconnected.

In client side socket-factory.ts, version 5.1.10

    socket.onopen = () => {
        pingInterval = setInterval(() => {
            if (Date.now() - lastRecievedMessageTimestamp > heartBeatInterval) {
                try {
                    socket.send(pingMessage)
                } catch (e) {
                    clearTimeout(pingInterval!)
                }
            }
        }, heartBeatInterval) as never as number
        socket.onopened()
    }

Maybe It should compare the current time with the timestamp of last sent message .

Please help me sync 5.1.7 to docker hub.

yasserf commented 3 years ago

I'm afraid CI is down and I haven't used docker in a long time so I don't have the environment running.

you can created the docker image via sh scripts/docker.sh, but it may need a tiny bit of tweaking to push it to your own repo for the time being.

yasserf commented 3 years ago

Also good point about the client, feel free to raise a PR if you have a moment and I'll publish that!

msmarks commented 3 years ago

Also good point about the client, feel free to raise a PR if you have a moment and I'll publish that!

I'm not familiar with typescript. I'll try it.

msmarks commented 3 years ago

PR raised, Please check if there is any problem. https://github.com/deepstreamIO/deepstream.io-client-js/pull/549

yasserf commented 3 years ago

I think I should have fixed the CI failing issue. Still takes forever, hopefully v5.1.8 should have all the artefacts.

Once you add Changelogs to your fix happy to publish that as well.

msmarks commented 3 years ago

That was great! The change log has been added.

yasserf commented 3 years ago

Can you confirm it works as expected so I can close this issue?

Thanks!

msmarks commented 3 years ago

There may be some problems. I set the heatbeattival 10s, but there was a disconnection. The log is as follows. I will continue to investigate this.

log from client

2020-12-03T23:26:30.629Z send heatbeat
2020-12-03T23:26:40.640Z send heatbeat
2020-12-03T23:26:50.648Z send heatbeat
2020-12-03T23:27:00.655Z send heatbeat
2020-12-03T23:27:10.662Z send heatbeat
2020-12-03T23:27:20.677Z send heatbeat
2020-12-03T23:27:30.684Z send heatbeat
2020-12-03T23:27:40.688Z send heatbeat
2020-12-03T23:28:00.702Z send heatbeat

The last line of the log shows that the heartbeat packet sending interval is 20s, not 10s.

then the server log

INFO | Heartbeat missing on websocket, terminating connection
msmarks commented 3 years ago

I found the problem. In client side socket-factory.ts.

socket.onopen = function () {
    pingInterval = setInterval(function () {
        if (Date.now() - lastSentMessageTimestamp  > heartBeatInterval) {
            try {
                socket.send(pingMessage);
                lastSentMessageTimestamp = Date.now();
            }
            catch (e) {
                clearTimeout(pingInterval);
            }
        }
    }, heartBeatInterval);
    socket.onopened();
};

There is a time error in lastSentMessageTimestamp and heartBeatInterval.

Specifically, if the heartBeatInterval is set to 1000ms, the callback function of setinvinterval will enter in 1000ms, 2000ms, 3000ms... However, the timer of JavaScript is not so accurate. It may enter in 1200ms, 2100ms and 3040ms... Hypothesis socket.send (pingmessage) took 10ms. The situation became:

1200ms for the first heartbeat, lastSentMessageTimestamp = 1210ms At the second 2100ms, it is judged at this time Date.now () - lastSentMessageTimestamp = 2100ms - 1210ms = 890ms。 This value is less than the heartbeatinterval, and the judgment condition of if is false. So I lost a heartbeat.

I will raise a PR to fix this problem.

msmarks commented 3 years ago

pr raised, https://github.com/deepstreamIO/deepstream.io-client-js/pull/550

msmarks commented 3 years ago

I found that heartbeat packet detection is bidirectional, which means that the server detects whether the time of the last message received is greater than twice the heartbeatInterval, and the client will also detect whether the time of the last received message is greater than twice the heartbeatInterval. Therefore, if there is a client that only sends messages and does not subscribe to messages, the client will never receive messages, so the client will think that the server has been disconnected. Therefore, my previous changes are wrong.

I think that for heartbeat packets, the client must send them strictly according to the set value, and do not skip the sending of heartbeat packets just because they have sent messages.

yasserf commented 3 years ago

Makes sense, thanks!

jaime-ez commented 3 years ago

Hi everyone! Sorry for being late to the discussion, but since the 5.1.6 release solves the issue with node ws, there is no need to change the socket-factory in the client from the previous 5.1.10 version. That logic is correct.

I found that heartbeat packet detection is bidirectional, which means that the server detects whether the time of the last message received is greater than twice the heartbeatInterval, and the client will also detect whether the time of the last received message is greater than twice the heartbeatInterval. Therefore, if there is a client that only sends messages and does not subscribe to messages, the client will never receive messages, so the client will think that the server has been disconnected.

This is not correct. For every PING sent there is a PONG response, therefore the client does update itslastRecievedMessageTimestamp value every time it sends a PING. As @yasserf stated here.

I think that for heartbeat packets, the client must send them strictly according to the set value, and do not skip the sending of heartbeat packets just because they have sent messages.

This is unnecesary as stated before, in fact in high payload situations it will add an extra PING message that has no purpose.

This works only in scenarios where the client heartbeatInterval <= 2 x server heartbeatInterval + PingPongTime, as oppposed to the existing logic in version 5.1.10 that works when the client heartbeatInterval <= server heartbeatInterval. Both versions fail when the client heartbeatInterval > 2x server heartbeatInterval + PingPongTime.

In conclusion, I propose:

Comments?

msmarks commented 3 years ago

This is not correct. For every PING sent there is a PONG response, therefore the client does update its lastRecievedMessageTimestamp value every time it sends a PING. As @yasserf stated here.

Ping is sent by the client, and the server returns pong and the client update lastRecievedMessageTimestamp. In version 5.1.10, if the client only subscribes to the message and does not publish any message, lastRecievedMessageTimestamp will update, and the client may not send any Ping. The server does not receive any client message, and thinks that the client is disconnected.

revert socket-factory.ts to the previous 5.1.10 version since it minimizes the number of messages sent in high payload scenarios.

So I don't think it's a good idea to go back to 5.1.10, but we can solve the problem of too many ping messages based on the current version.

jaime-ez commented 3 years ago

Ok I finally understood your problem (being subscribed only AND receiving messages with an interval less than heartbeat), although this is an issue only in http server. uws has an idleTimeout option, set to heartBeatInterval value, that prevents this from happening since it checks for both incoming or outgoing messages.

However as I mentioned previously the fix on the client version 5.1.12 also breaks when the client heartbeatInterval > 2 x server heartbeatInterval + PingPongTime (try it with 3x).

An option that should solve the problem is to keep track of exchanged messages:

This way, we can emulate the idleTimeout functionality of uws on the node server and avoid sending unnecesary PING messages from the client. The tradeoff is that when using uws we store an unnecesary value but it shouldn't matter much.

What do you think?

yasserf commented 3 years ago

We use different socket wrappers so the code can theoretically stay separate.

TBH given we are sending pings every 10 or 20 seconds and they should be aggregated within the socket wrapper bulk (so if there are other messages they get bundled in one websocket frame) I'm wondering if there's any harm with keeping this in until someone asks to make deepstream faster, which hasn't really happened yet 🙈. It's def something we can improve on but I'm not sure if it's a high priority vs getting CI systems working and possibly spitting out some react hooks integrations to see if they teases any adoption 😁

jaime-ez commented 3 years ago

Hi, I agree the CI is priority, but this is easier for me now :)

Ok maybe the ping is not relevant, but the connection still is closed when the client heartbeat interval is bigger than 2 x the server one using node http server and that's an issue I think. What I propose in the last comment should fix it regardless of the client code.

That way we also avoid syncing heartbeats as proposed previously

yasserf commented 3 years ago

I'm afraid I'm not fully following yet. lastMessageExchangedAt means either sent or received, but you can send a message a client that is disconnected and it just disappears into the void (which is one of the reasons heartbeats exist).

The point for syncing heartbeats is the server and client need a single source of truth. If I put in a heartbeat of a thousand on the server and a million on the client then assuming there is a connection loss that doesn't include the correct handshake the server will close but the client will wait for at least a million milliseconds. How exactly does any scenario fix that without having heartbeat synchronization?

jaime-ez commented 3 years ago

Ok, just had time to come back to this. I was wrong all along, plugging out devices made me realize it :)

uws's idleTimeout also prevents the connection from being closed at the server when the client is plugged out. Therefore the complete solution is to keep the client as is, implement the heartBeat sincronization between server and client, implement the heartbeat verification in the uws service as it was done in the node service and disable the idleTimeout. I'll break it into different issues.

One more thing, @yasserf please your comments:

TBH given we are sending pings every 10 or 20 seconds and they should be aggregated within the socket wrapper bulk (so if there are other messages they get bundled in one websocket frame)

Where's the socket wrapper bulk? Couldn't find it in the client code...I guess it should be on src/connection/connection.ts but I don't see anything that could implement that. All I've found is a bulk subscription utility