centrifugal / centrifugo

Scalable real-time messaging server in a language-agnostic way. Self-hosted alternative to Pubnub, Pusher, Ably. Set up once and forever.
https://centrifugal.dev
Apache License 2.0
8.3k stars 587 forks source link

Graceful restart & zero downtime deploy #134

Closed hn0pw closed 7 years ago

hn0pw commented 7 years ago

Maybe another interesting thing is to gracefully terminate & restart the server. Especially after an update to not drop all connections, it's possible to run kill -USR2 <PID> to restart the server.

https://github.com/facebookgo/grace

I didn't use this at the moment, but i've plans to implement this in our server.

FZambia commented 7 years ago

I have not used grace library (or similar) before and can't say at moment is it possible or not in our case as we work with websockets and SockJS server (i.e. not just using standard lib http server and work with long-lived connections) - need to investigate. If you will have more info about this - please share.

Btw, at moment Centrifugo handles SIGTERM - it sets 503 on HTTP handlers and gracefully closes active connections with reconnect advice so clients will reconnect to another Centrifugo instance or to this instance when become available.

FZambia commented 7 years ago

I investigated this question a bit. As I said things a bit different for long-lived connections like websockets.

Grace starts new server and waits for active connections to finish on old instance. In our case clients won't finish their connections for days. So we eventually will be forced to close them. And we already do this! The only thing that we dont do at moment is draining connections - i.e. send all messages from client queues before closing connection. This was suggested by @klauspost a long time ago but I decided that it was overkill at that moment and there was a way to lose some messages anyway during reconnect - so we put only parts I described in previous comment into shutdown process. Now with message recovery mechanism connection draining on shutdown makes much more sense so I'll try to think about this.

Here is some useful links to read on this topic that I found while searching for approaches:

http://stackoverflow.com/questions/38194137/graceful-restart-of-a-server-with-active-websocket-connections-in-go

https://github.com/golang/go/issues/17721

So looks like we already do almost the best possible effort here...

hn0pw commented 7 years ago

Thanks for investigate this shortly. After your explanations it's sure not the same as a "normal" http server with short living connections, and not that easy to implement. With the SIGTERM handling, i think a good alternative already implemented, and this issue can be closed.