0xProject / 0x-mesh

A peer-to-peer network for sharing 0x orders
https://0x-org.gitbook.io/mesh/
Other
258 stars 111 forks source link

Shutdown gracefully #161

Open opaolini opened 5 years ago

opaolini commented 5 years ago

About

Mesh should attempt to gracefully shutdown upon receiving a SIGTERM, basically allowing it to drain active connections and shutdown services. In the future it should also respond as unhealthy/unready when probed for health or readiness.

Having a graceful shutdown would also benefit operators running mesh on Kubernetes (and probably other orchestration tools), as when pods are to be updated/moved/evicted kubelet sends a SIGTERM to the main process and waits for up to a grace period (by default 30 seconds) until it sends a SIGKILL.

Below is a rough sketch implementation of that, however I should deliver a working and tested PR in the coming days if interested!

Rough implementation

done := make(chan os.Signal, 1)
signal.Notify(done, os.Interrupt, syscall.SIGINT, syscall.SIGTERM)

// Start core.App.
app, err := core.New(coreConfig)
if err != nil {
  log.WithField("error", err.Error()).Fatal("could not initialize app")
}
if err := app.Start(); err != nil {
  log.WithField("error", err.Error()).Fatal("fatal error while starting app")
}
defer app.Close()

// Start RPC server.
go func() {
  err := listenRPC(app, config)
  if err != nil {
    app.Close()
    log.WithField("error", err.Error()).Fatal("RPC server returned error")
  }
}()

// mesh is running at this point
<-done

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer func() {
  // additional handling here
  cancel()
}()

// handle stopping services, and draining connections
if err := app.Shutdown(ctx); err != nil {
  log.Fatalf("mesh shutdown failed with err:%+v", err)
}
log.Info("mesh stopped")
albrow commented 5 years ago

Definitely want to do this at some point. We might need to address #96 first.

albrow commented 5 years ago

27 is also potentially related.

fabioberger commented 4 years ago

Another part of this will be revisiting all uses of context.Background() and making sure we actually pass down the top-level context for the entire application, so all internal ongoing requests are aborted if it gets cancelled.