Closed zhiqiangxu closed 5 years ago
I think the key of using tableflip is to replace net.Listen with upg.Fds.Listen, so that when Upgrade the listening socket will be inherited by child.
That sounds about right.
Can you share the snippet that does the listen and then serve? Looks like you're using net/http
, are you calling ListenAndServe directly?
I'm listening with upg.Fds.Listen
and then pass the net.Listener
to serve
function, I do the same for both http server and my private tcp server:
...
upg, err := tableflip.New(tableflip.Options{PIDFile: pidFile})
if err != nil {
panic(err)
}
defer upg.Stop()
....
httpServer := &http.Server{Addr: httpAddr}
ln, err := upg.Fds.Listen("tcp", httpAddr)
if err != nil {
instance.Logger().Errorln("Listen httpAddr err", err)
return
}
var g run.Group
g.Add(func() error {
// upg.Fds.Listen is passed to qserver as func, and called inside ListenAndServe
err := qserver.ListenAndServe()
if err != nil {
instance.Logger().Errorln("ListenAndServe err", err)
}
return err
}, func(err error) {
shutdownQRPC(qserver)
instance.Logger().Infoln("shutdownQRPC err", err)
})
g.Add(func() error {
...apis omited...
err = httpServer.Serve(ln)
if err != nil {
instance.Logger().Errorln("httpServer.Serve err", err)
}
return err
}, func(err error) {
shutdownHTTP(httpServer)
instance.Logger().Infoln("httpServer.Shutdown", err)
})
go func() {
sig := make(chan os.Signal, 1)
signal.Notify(sig, os.Interrupt, os.Kill, syscall.SIGTERM, syscall.SIGUSR1, syscall.SIGQUIT)
for range sig {
// upgrade on signal
instance.Logger().Errorln("GracefulUpgrade", config.Load().GracefulUpgrade)
instance.Logger().Errorln("upgrade start")
err := upg.Upgrade()
instance.Logger().Errorln("upgrade end", err)
if err != nil {
instance.Logger().Errorln("upgrade failed:", err)
}
}
}()
groupDoneCh := make(chan struct{})
go func() {
err := g.Run()
if err != nil {
instance.Logger().Errorln("Group.Run err", err)
}
close(groupDoneCh)
}()
if err := upg.Ready(); err != nil {
instance.Logger().Errorln("upg.Ready err", err)
return
}
instance.Logger().Infoln("child ready to serve")
// ready to exit
select {
case <-upg.Exit():
case <-groupDoneCh:
}
instance.Logger().Errorln("parent prepare for exit")
shutdownQRPC(qserver)
shutdownHTTP(httpServer)
And here's the code for qserver.ListenAndServe
:
// ListenAndServe starts listening on all bindings
func (srv *Server) ListenAndServe() error {
var g run.Group
for i := range srv.bindings {
idx := i
binding := srv.bindings[i]
g.Add(func() error {
return srv.listenAndServe(idx, binding)
}, func(err error) {
serr := srv.Shutdown()
LogError("err", err, "serr", serr)
})
}
return g.Run()
}
func (srv *Server) listenAndServe(idx int, binding ServerBinding) (err error) {
var (
ln net.Listener
)
if binding.ListenFunc != nil {
ln, err = binding.ListenFunc("tcp", binding.Addr)
} else {
ln, err = net.Listen("tcp", binding.Addr)
}
if err != nil {
return
}
err = srv.Serve(ln.(*net.TCPListener), idx)
return
}
Here's the complete code:
https://github.com/zhiqiangxu/qchat/blob/master/app/server/common.go
Ah, I think the problem is that you have a race condition between calling Ready
in main and Listen
in Server.ListenAndServe. Ready closes all inherited-but-unused Fds. The following ordering is causing your problem:
Here, binding.ListenFunc won't be able to use the inherited Fd, because it was already closed by Ready. It then tries to do a regular net.Listen, which fails because the old process is still around and holding onto the listening socket.
To fix this I recommend creating all listeners in main()
and only then calling Ready.
Nice catch! I'll try it out asap:)
I think the key of using tableflip is to replace
net.Listen
withupg.Fds.Listen
, so that whenUpgrade
the listening socket will be inherited by child.But I've got errors like below when
Upgrade
after the application has run for quite long in production environment.It seems that the parent has opened a listener on port
8902
but the child doesn't inherit that listener. Any possible reason?