Closed marwan-at-work closed 2 years ago
CC @neild @bradfitz
This proposal passes a Context and a graceful shutdown timeout to Server.ListenAndServeContext
. The Context bounds the time until the server starts graceful shutdown, and the timeout bounds how long the graceful shutdown period lasts.
The existing Server.Shutdown
method takes a Context
to bound the graceful shutdown period.
Under this proposal, therefore, we'd have one API that takes a Context
to bound the server lifetime but not the shutdown period, and a different and incompatible API which takes a Context
to bound the graceful shutdown period but not the server lifetime. That seems confusing. It also means the ListenAndServeContext
function would only be usable in cases where there is no graceful shutdown period or the grace period is strictly time-bounded.
I think this proposal is addressing two issues at the same time:
http.Server
's lifetime using a Context
without spinning up a goroutine and manually watching for context cancellation. I don't know how much of an issue this is in practice; how often do people want to stop a server using a Context
rather than by calling Server.Shutdown
?Server.Shutdown
and Server.ListenAndServe
are somewhat sharp-edged: ListenAndServe
returns immediately with an error, when returning nil
after shutdown has completed might be friendlier in most cases.Perhaps it might be simpler to address the second point without introducing a context to ListenAndServe
.
how often do people want to stop a server using a Context rather than by calling Server.Shutdown?
Basically always? Context is the canonical way to control the lifetime of a call. This is why signal.NotifyContext was added to the standard library, because it was so common to want to hook these two things up.
I wonder if the solution is to add graceful shutdown semantics right into context.Context, and then you'd only need ListenAndServeCtx, and the context would fully contain graceful exit semantics as well as simple cancel semantics.
And this would also then give everyone who uses contexts a one-stop shop for graceful shutdowns.
I think we steal the idea from Shutdown(ctx) - implement a separate context that only controls the lifetime of Shutdown logic. It'll be a context within a context, but if we can word things well, I think it'll be ok.
// ContextForShutdown returns the Context that controls the lifetime of the logic that
// runs when ctx's Done channel is closed.
func ContextForShutdown(ctx Context) Context
// WithShutdownTimeout returns a context that puts a timeout on the context returned by
// ContextForShutdown, and a CancelFunc that closes the done channel on the shutdown context.
func WithShutdownTimeout(parent Context, timeout time.Duration) (ctx Context, cancelShutdown CancelFunc)
// WithShutdownCancel returns a context and a CancelFunc that will close the done
// channel on the context returned by ContextForShutdown for ctx.
func WithShutdownCancel(parent Context) (ctx Context, cancelShutdown CancelFunc)
I use Shutdown here to try to avoid the confusion of reusing "Cancel". Shutdown being the logic that runs after the original context is marked done.
// in user code, this is all we'd need for graceful shutdown of an application
ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt)
defer stop()
ctx, cancelShutdown := context.WithShutdownTimeout(ctx, timeout)
defer cancelShutdown()
return server.ListenAndServeCtx(ctx)
And then once the original context gets closed by the Signal, http.Server would call context.ContextForShutdown on the context it was passed by ListenAndServeCtx, and use that to pass into Shutdown.
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
I'm agnostic about the right API here, but I do think this is a common need and a bit of a PITA to code correctly.
Maybe it accepts a callback to run when ctx1 is cancelled and that callback returns the context for the shutdown?
In the event of a nil
Server.BaseContext
, would the provided Context
be used?
It sounds like having some kind of graceful shutdown is a common need that might be worth filling. People seem less enthusiastic about these specific API details. Does anyone want to propose a simpler way?
I'm all ears (or eyes I guess) for proposing a simpler way 👍🏼
One alternative we can do is have server.Server
take a context as an optional field (in addition to context timeout) and this way *Server.ListenAndServe
(and friends) can check if the context field is not nil and it would launch a goroutine under the hood that manages the graceful shutdown. This would mean no new additional methods but only two additional fields.
Something like this:
type Server struct {
...
ShutdownTimeout time.Duration
Context context.Context
...
}
and you would just call server.ListenAndServer()
as you normally do. People managing ListenAndServe manually won't be broken because the Context will be nil. And if people decide to add a Context, then they are explicitly choosing the new API and would not need to manage graceful shutdown anymore.
Personally I still prefer the Go convention of explicitly passing context as a method/function argument but there's precedent here with net/http.Request
storing the context as a field. So I'm happy either way
I think having http.Server
itself support context based shutdowns isn't enough, often there are other background tasks, connection upgrades, or even multiple http.Server
s running which all need to be coordinated and accounted for in a graceful shutdown.
I would rather see something like a version of github.com/oklog/run
that can support both context-based and call-based cancellation, allowing existing code to plug in to coordinated shutdowns.
I think that any proposal here needs to start with a clear goal.
The original post here includes a function which shows how to run an http.Server
with a context-bounded lifetime. Lightly modified:
func ListenAndServeTLS(ctx context.Context, server *http.Server, shutdownTimeout time.Duration, certFile, keyFile string) error {
serverErr := make(chan error, 1)
go func() {
serverErr <- server.ListenAndServeTLS(certFile, keyFile)
}()
select {
case <-ctx.Done():
ctx, cancel := context.WithTimeout(context.Background(), shutdownTimeout)
defer cancel()
err = s.Shutdown(ctx)
case err := <-serverErr:
return err
}
}
That doesn't look too bad to me.
The least elegant part to my eyes is the handling of shutdownTimeout
, which creates a new background context to apply the graceful shutdown timeout. However, this lack of elegance is, I think, implicit in using contexts to bound the lifetime of operations: We cannot both bound the server lifetime with a context and have a graceful shutdown, because graceful shutdown requires extending the server lifetime past the point the server context expires.
This argues to me that a context is not the right way to bound the lifetime of an http.Server
, and that the API we have today with Server.Close
and Server.Shutdown
explicitly terminating a server's lifetime is the better approach.
There are already ListenAndServe, ListenAndServeTLS, Serve, ServeTLS. Adding more context variant methods seems bad, so adding struct fields seems like lesser evil. The approach of using callbacks is pretty flexible. Maybe BeforeFunc func(*http.Server) error (not sure if this is a good name).
server := &http.Server{
Addr: addr,
Handler: handler,
BeforeFunc: func(s *http.Server) {
log.Print("starting up")
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
<- c
log.Print("shutting down")
ctx, cancel := context.WithTimeout(context.Background(), shutdownTimeout)
defer cancel()
return s.Shutdown(ctx)
},
}
if err := server.ListenAndServeContext(ctx, time.Second); err != nil {
log.Fatal(err)
}
This argues to me that a context is not the right way to bound the lifetime of an
http.Server
, and that the API we have today withServer.Close
andServer.Shutdown
explicitly terminating a server's lifetime is the better approach.
I concur. My mental model of context.Context
is that cancellation means “the caller is no longer interested in the results of this call”, and through that lens passing a Context
to ListenAndServeContext
would mean “the caller is no longer interested in the output of the server”. That sounds to me like a hard shutdown, not a graceful one.
Is there anything to learn from https://github.com/facebookarchive/grace? I don't remember much about it, only that it was an early graceful shutdown contender.
It sounds like we are still waiting for the right approach.
Nit: regarding graceful shutdown, tableflip
looks more supported and maintained https://github.com/cloudflare/tableflip
I'm not sure how to move this discussion forward. Does anyone want to propose an alternative, simpler API?
Alternately, http.Server already has Shutdown and Close. It's a few lines of code to connect those to whatever contexts or timeouts that might be needed in a given situation. And servers don't get started and stopped at high frequency, so an extra goroutine or two wouldn't matter. Do we really need more than those? Is there something else they don't provide that we should be making available? That is, maybe Shutdown and Close are the alternative, simpler API.
I think the existing Shutdown
and Close
methods are fine.
I do not think we should provide a version of ListenAndServe
that takes a Context
that triggers server shutdown. As I mentioned above, the fact that the server needs to run for a grace period after shutdown has begun argues that a context isn't the right way to bound a server's lifetime. If you do want to shut down a server when a context expires, you can do so easily enough by calling Shutdown
.
It would be nice if ListenAndServe
didn't return until after the graceful shutdown period, and returned nil
after a graceful shutdown. This would mean you could write something like this:
func main() {
// ...
if err := server.ListenAndServe(); err != nil {
log.Fatal(err)
}
}
Instead of this:
func main() {
// ...
if err := server.ListenAndServe(); err != http.ErrServerClosed {
log.Fatal(err)
}
select{} // whatever called Shutdown needs to call os.Exit
}
Perhaps we could add a Server.Wait
method, which waits for shutdown to complete and returns any non-ErrServerClosed
error from ListenAndServe
:
func main() {
server.ListenAndServe() // ignore error, Wait will return it
if err := server.Wait(); err != nil {
log.Fatal(err)
}
}
I'm not sure if I understand the suggestion above, so I'd love to see a full end to end example of how the code would look like from the caller's side while handling all scenarios such as pre-run errors like port-already-in-use, and post-run errors such as shutdown-timeout.
Also, telling people to ignore the error from ListenAndServe and to instead use Wait seems a little odd to me. I think I'd rather deprecate a method entirely in favor of a newer one as that will feel more clean cut than deprecating only the return part of an existing method signature.
Also I'd love to mention that my reasoning behind this issue (and not particularly the proposal itself) is the fact that today if you want to "properly" run a server in production, you need to handle graceful shutdowns and therefore you need to write this code (or a variation of it) every single time: https://github.com/marwan-at-work/serverctx/blob/main/serverctx.go
The code itself is not particularly long, but it is subtle and in retrospect it would have been great if the net/http library only exposed a single way of running/closing a server that was always graceful.
Therefore, I am mostly interested in ways can steer the community to always consider graceful shutdowns when running production servers.
Of course, adding new APIs can do just that but if there isn't a simple/clean way, maybe the solution is to just properly document/encourage people of this best practice?
For example
http.ListenAndServer
that it does not handle graceful shutdowns. Maybe we can even deprecate it in favor of doing whatever Example Test suggests.go vet
can do to help in that regard. That said, I'm definitely happy to see new API suggestions like the one above and how they would work.
Thanks!
I'm not sure about changing ListenAndServe to ever return nil (lots of code does log.Fatal(http.ListenAndServe(...)) and it would be weird to get log.Fatal(nil) out of that). But we could certainly have a distinguished ErrGracefulShutdown (bad name) or something like that. In any event, that sounds like a different proposal.
Sounds like new API as described in this issue is a likely decline.
Based on the discussion above, this proposal seems like a likely decline. — rsc for the proposal review group
Proposal
I propose adding two new methods to the net/http *Server struct:
ListenAndServeContext
andListenAndServeTLSContext
that will handle gracefully shutting down the server upon a context cancellation signal.To accompany these two methods, we need a graceful shutdown timeout which can either live as a field in the Server struct, or an argument to the new methods.
The signature can either be:
OR
It might also be worth noting that there are two more methods (Serve, and ServeTLS) that can arguably take a context as well. I rarely ever encounter explicit net.Listeners being passed in when starting an http server so I'll leave that up for discussion.
The examples below will assume option 2 above as it makes it more explicit for the user to pass a Shutdown timeout instead of it being tucked away as a struct field.
Code Examples
The code from the caller's perspective will look something like this:
Implementation
The underlying implementation would abstract the subtle details to handle graceful shutdowns:
Notably, the implementation above would return a nil error when a server is gracefully and successfully shutdown without hitting the timeout. We could also explicitly return the ctx.Error() or a new error variable if we never want to return a nil error. But I think a nil error is a fine signal for saying a server was successfully terminated.
Finally, there's a question of what a
0
duration would mean:s.Shutdown
will immediately be cancelled.s.Shutdown(context.Background()
.The first option seems more rational though the second might be nice for convenience but it would have to be explicitly documented.
Rationale
Many people today run Go servers while not implementing graceful shutdown in the first place (for example calling the global http.ListenAndServe function) or implementing it but missing edge cases such as the following:
Misusing shutdown.
Ignoring legitimate ListenAndServe errors:
And probably other subtleties. I have myself made those mistakes many items and ended up abstracting the logic into its own context: https://github.com/marwan-at-work/serverctx/blob/main/serverctx.go
Looking at private code within my company I also noticed each team does a flavor of the above which seems like a good case for code re-use.
Furthermore, looking at the standard library we already have similar patterns such as
exec.Command
andexec.CommandContext
which could make this a natural fit for Go developers already familiar with these APIs.