SocketCluster / socketcluster-server

Minimal server module for SocketCluster
http://socketcluster.io/
MIT License
108 stars 50 forks source link

How does websockets handle disconnects? #3

Closed mattkrick closed 8 years ago

mattkrick commented 8 years ago

I couldn't find this in the docs or SC, could you help me out? A server has a pingTimeout of 20 seconds. A user connects, authenticates, and subscribes to foo at time = 0. That user loses internet at time = 5. A message is published to foo at time = 10. If the user reconnects at time = 15, will they get the message? is it guaranteed? If so, where is the queue? If not, are they aware they missed a message?

jondubois commented 8 years ago

@mattkrick Sorry for the massive delay in reponse. You probably already figured this out but for the sake of closing this issue:

In this situation, when you do a publish while disconnected, the SC client will try to reconnect. If the internet is still down, SC won't be able to reconnect. This will cause the publish action to fail - If a callback was provided to the publish method, an error will be passed to it which will indicate the timeout.

agilethomas commented 8 years ago

@jondubois I'm not sure if your response answers the question (as I read it). It sounds like you are saying that the user who publishes the message will have the error passed to the callback if the publish fails, but what happens if the publish succeeds by one user and a second user's internet is the problem? Here's my understanding of the question (since I'm looking for the answer myself):

User A and B both subscribe to channel "foo" at time = 0 At time = 5, user A loses internet At time = 10, user B publishes a message to "foo" At time = 15, user A reconnects

Will user A get the message that was published to the "foo" channel?

mattkrick commented 8 years ago

As I understand it, no. The message was pushed to A, we didn't know A was disconnected because it was just a hiccup between heartbeats, and there's no callback to ensure durability. To ensure A got it, the server would need to wait for a callback from A for every message and a queue (redis?) would have to persist the message until every connected user received the message. Not impossible, but at scale, I can think of very few sites that use durable messaging without employing something like rabbitMQ.

jondubois commented 8 years ago

@agilethomas As @mattkrick mentioned, in the case were the connection is lost for a short amount of time - It is not long enough for the heartbeat/ping timeout to occur (but this depends on the value of your pingTimeout option), as a result, the same underlying WebSocket/TCP connection will be maintained by SC (not closed).

Because TCP guarantees in-order message delivery (over a single session/connection), client A would still receive the message in your scenario. If multiple messages were 'missed', they will all arrive to client A in quick succession once the TCP session is recovered (once internet connectivity comes back).

If however, user A's connection is lost for a longer amount of time (whatever your pingTimeout option is), the underlying WebSocket/TCP connection would have timed out and been closed down. In this scenario, when the internet comes back for user A, the SC client will reconnect using a new underlying WebSocket connection - So in this case client A would have missed all messages published while it was offline.

So basically, SC guarantees in-order, exactly-once message delivery for minor (short-duration) network failures. However, If a client times out for a long time, then you may have to recover missed messages by getting a new snapshot of your data/message log from a database/datastore.

Note that SC exposes MIDDLEWRE_PUBLISH_IN and MIDDLEWARE_PUBLISH_OUT middleware lines which allow you to define custom functions to capture inbound (coming from a single client and going to the server) and outbound messages (coming from the server and going out to multiple clients) so you can add logic to persist messages yourself.

agilethomas commented 8 years ago

Thank you very much for the detailed explanation. That's good to know and I think will serve our needs well. We'll just have to code our app so that any time it has to reconnect, we'll request a snapshot of the current state. We're currently using socket.io in our application, and it appears that we are losing messages when the network connection is poor, but I can't be sure if it's a socket.io, TCP, or application issue. But in any case, once I stumbled upon this framework, I realized it will be so much easier to work with than socket.io, since the nature of our app is a pub/sub system. That means very minimal code to route messages between clients.

Another quick question (not sure where else to ask this): if a client subscribes to a channel, and then publishes to that same channel, will the client that published the message also receive the same message that it sent? I wouldn't expect that to be the behavior but that's how it is working for me, so I wasn't sure if I was doing something wrong or not.

jondubois commented 8 years ago

@agilethomas Yes, if the client is subscribed to a channel and publishes to it, that client will in fact receive its own message. See https://github.com/SocketCluster/socketcluster/issues/54

Also, you may want to read about the PUBLISH_OUT middleware here: http://socketcluster.io/#!/docs/middleware-and-authorization