golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.46k stars 17.59k forks source link

net/http/httptrace: add ServerTrace hooks #18997

Open joeshaw opened 7 years ago

joeshaw commented 7 years ago

Problem

A very common pattern in Go HTTP servers is to implement an http.ResponseWriter that wraps another http.ResponseWriter and captures the status code. This is often used for logging and metrics collection.

For example,

type statusCaptureWriter struct {
    http.ResponseWriter
    status int
}

func (scw *statusCaptureWriter) WriteHeader(status int) {
    scw.status = status
    scw.ResponseWriter.WriteHeader(status)
}

type loggedHandler struct {
    handler http.Handler
    logger SomeLogger
}

func (h *loggedHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    scw := &statusCaptureWriter{ResponseWriter: w}
    h.handler.ServeHTTP(scw, r)
    h.logger.Logf("status=%d ...", scw.status)
}

I've written something like this a bunch of times. You can find examples in nearly every Go web framework out there. One example is https://github.com/urfave/negroni/blob/master/response_writer.go#L13-L26.

There are some issues with this approach. For instance, my statusCaptureWriter doesn't implement other interfaces like http.Flusher, http.CloseNotifier or http.Pusher. I can't determine at compile time whether the underlying http.ResponseWriter implementation implements any of these interfaces, so if I choose to implement them I might lie to callers at higher levels of the stack and inadvertently break things. (This particularly a problem with CloseNotifier.)

Proposal (rejected, see below)

I'd like to propose an additional interface, http.Statuser (better name welcome) that exposes the status code within a http.ResponseWriter implementation. The internal http.(*response) implementation already tracks the status code written, so this can just be exposed and it will automatically implement this interface.

Software could avoid wrapping the http.ResponseWriter by instead type asserting it to http.Statuser and getting the status as needed there. (And it could optionally continue to wrap the ResponseWriter as needed until this is widely deployed.)

type loggedHandler struct {
    handler http.Handler
    logger SomeLogger
}

func (h *loggedHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    // Could check for http.Statuser implementation here and wrap http.ResponseWriter 
    // if necessary, but omitting for brevity

    h.handler.ServeHTTP(w, r)

    status := 0
    if s, ok := w.(http.Statuser); ok {
        status = s.Status()
    }

    h.logger.Logf("status=%d ...", status)
}

Alternative proposal

Implement an httptrace.ServerTrace struct that is analogous to the ClientTrace already there. See https://github.com/golang/go/issues/18997#issuecomment-279611928 for more info.

tombergan commented 7 years ago

That is a nice summary. I think all six of your listed use cases could be handled by httptrace.ServerTrace. For 4 & 5, we'd pass through the status code (and possibly headers) from the ResponseWriter to the ServerTrace. 6 should "just work" since ServerTrace doesn't wrap the ResponseWriter.

rhysh commented 7 years ago

I think all six of your listed use cases could be handled by httptrace.ServerTrace It could do any of them. Where should it focus? Where should the discussion focus?

httptrace.ClientTrace doesn't allow getting the status code; that's trivial to do in a http.RoundTripper. But it also doesn't allow getting the sizes of the request body or the response body—both possible, but inconvenient, to get from a http.RoundTripper wrapper.

ClientTrace focused on giving access to net/http internals that it alone could provide.

In the current implementation draft of ServerTrace, the GotBadRequest hook gives access to information that would not otherwise be available to http.Handler authors. This looks like a good way to give access to that information.

The WroteHeader and WroteBodyChunk callbacks give access to data that http.Handler authors already have. It's not very convenient access because of the pattern of adding secret bonus methods on http.ResponseWriters, but it's access.

Should ServerTrace have a callback for when Flush is called and for when http/2 push is used? If sendfile(2) is activated, should a hook be called at the beginning/middle/end of that transfer?

I think it would be helpful to split the discussion into 1) information present in the internals of net/http that is not exposed to users, and 2) information that users have, but need better access to.


Use of the http.Server.UpdateRequestContext field looks clunky. It's called before the http.Request is available, meaning that users would need to coordinate between the code that runs during that hook and a top-level http.Handler in order to match the ServerTrace hooks to an http.Request.

Since it's attached to the http.Server, it can't be used by middlewares. It's set once at the top level by the owner of the service, making it inaccessible to the maintainers of middleware packages. They're likely to continue wrapping http.ResponseWriters, since that would be the only way the code they control is able to access the data they need.

I have a proposal for an alternate design, if we split the functionality more or less into the two types of data I identified above:

1) A new field on http.Server points to a struct that contains the GotBadRequest hook (and maybe some others). Consider this logically as a structured extension of the http.Server.ErrorLog field.

2) A function in net/http that takes the Context value from a http.Request (or a *http.Request directly) and something like a httptrace.ServerTrace struct, registering it for callbacks as the request is served. Any layer of http.Handler middleware could call the function, just as any layer of http.RoundTripper middleware can with ClientTrace.

(The function wouldn't return a Context like the ClientTrace equivalent does; the http.Server implementation would use a private context key to store and access a function of type func(httptrace.ServerTrace). The net/http package's function would extract and call it.)

tombergan commented 7 years ago

[UpdateRequestContext is] called before the http.Request is available, meaning that users would need to coordinate between the code that runs during that hook and a top-level http.Handler in order to match the ServerTrace hooks to an http.Request.

Not sure I followed? The hooks will be available via http.Request.Context(). A pattern like the following should work:

s := &http.Server{
  UpdateRequestContext: func(ctx context.Context) context.Context {
    sh := newServerHooks()  // a user-defined struct
    ctx = context.WithValue(ctx, key, sh)
    return httptrace.WithServerTrace(ctx, &httptrace.ServerTrace{
      GotRequest: sh.GotRequest,
      ...
    })
}

The handler can then use ctx.Value(key) to extract sh.

Since [UpdateRequestContext is] attached to the http.Server, it can't be used by middlewares. It's set once at the top level by the owner of the service, making it inaccessible to the maintainers of middleware packages.

The middleware needs some entry point in order to apply its http.ResponseWriter wrapper. Couldn't the middleware use a similar entry point to install its UpdateRequestContext wrapper? I am admittedly not familiar with many such libraries (aside from the few I've written myself for internal use), so I am very interested in hearing about libraries where UpdateRequestContext will be difficult to use.

A function in net/http that takes the Context value from a http.Request ...

Can you explain this with a pseudocode example? I did not follow.

A new field on http.Server points to a struct that contains the GotBadRequest hook ... I think it would be helpful to split the discussion into 1) information present in the internals of net/http that is not exposed to users, and 2) information that users have, but need better access to.

That is a useful distinction for this discussion, however, my main concern is API clutter. I would like to minimize the API surface and avoid cluttering http.Server with callback fields, if possible. I fear we are heading towards API clutter unless the new functions/callbacks/fields we add are sufficiently general.

I am promoting two APIs: (1) Wrapping http.ResponseWriter. This will be the canonical way to affect how the response is produced. (2) UpdateRequestContext. This will be the canonical way to get a read-only view of each request for logging. There may need to be a long list of read-only hooks, but at least those hooks will be hidden in net/http/httptrace rather than cluttered in net/http.

rhysh commented 7 years ago

Considering the case where a user wants to log the status code of all responses, here are three possible approaches:

1) No changes to net/http. Logging package authors wrap the http.ResponseWriter, providing a function with signature func(inner http.Handler) http.Handler. Application owners can call it any time they have a http.Handler, sometime before setting their http.Server.Handler field.

2) Add a field to http.Server, UpdateRequestContext. Logging package authors write two pieces of code: their usual middleware (to access the URI path from the http.Request), and a function to add to the UpdateRequestContext field (to access the status code). Application owners call one of the functions from their http.Handler, and the other where they set up the http.Server (which may be in a different file or package).

3) Add a function in http designed to be called from an http.Handler. Logging package authors convert their http.ResponseWriter wrappers to use the new function with no change in the API they provide to users. Applications owners can call it any time they have a http.Handler, sometime before setting their http.Server.Handler field.

The shortcoming of (1) is that it blocks access to additional methods on the http.ResponseWriter. The shortcoming of (2) is that it requires users to coordinate between two different parts of the net/http API. Users who only have access to an http.Handler cannot use it (and would be forced to use the existing option 1).

Code for each follows. Does example 2 match the API design you have in mind? Can you give a more concise demonstration of correct use of the new field?

1) No changes to net/http:

type statusRecorder struct {
    status int
    w      http.ResponseWriter
}

func (srw *statusRecorder) Header() Header {
    return srw.w.Header()
}

func (srw *statusRecorder) Write(p []byte) (int, error) {
    if srw.status == 0 {
        srw.status = http.StatusOK
    }
    return srw.w.Write(p)
}

func (srw *statusRecorder) WriteHeader(status int) {
    srw.status = status
    srw.w.WriteHeader(status)
}

func logStatusMiddleware_ResponseWriter(inner http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        srw := &statusRecorder{w: w}

        defer func() {
            log.Printf("path=%q status=%d", r.URL.Path, srw.status)
        }()

        inner.ServeHTTP(srw, r)
    })
}

2) Add a field to http.Server, UpdateRequestContext:

func logStatusMiddleware_AndServerUpdate(inner http.Handler, srv *http.Server) http.Handler {
    ctxKey := new(int)

    prevFn := srv.UpdateRequestContext
    newFn := func(ctx context.Context) context.Context {
        if prevFn != nil {
            ctx = prevFn(ctx)
        }
        var (
            mu   sync.Mutex
            info httptrace.WroteHeaderInfo
        )
        ctx = httptrace.WithServerTrace(ctx, &httptrace.ServerTrace{
            WroteHeader: func(i httptrace.WroteHeaderInfo) {
                mu.Lock()
                defer mu.Unlock()
                info = i
            },
        })
        ctx = context.WithValue(ctx, ctxKey, func() httptrace.WroteHeaderInfo {
            mu.Lock()
            defer mu.Unlock()
            return info
        })
    }
    srv.UpdateRequestContext = newFn

    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        defer func() {
            info := r.Context().Value(ctxKey).(func() httptrace.WroteHeaderInfo)()
            log.Printf("path=%q status=%d", r.URL.Path, info.StatusCode)
        }()

        inner.ServeHTTP(w, r)
    })
}

3) Add a function in http designed to be called from an http.Handler:

func logStatusMiddleware(inner http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        http.AddServerTrace(r.Context(), &httptrace.ServerTrace{
            WroteHeader: func(info httptrace.WroteHeaderInfo) {
                log.Printf("path=%q status=%d", r.URL.Path, info.StatusCode)
            },
        })

        inner.ServeHTTP(w, r)
    })
}
urandom commented 7 years ago

If I'm reading this correctly, ServerTrace proposal doesn't allow modification of the response body prior to writing. Therefore it will not allow things like compressing the response body prior to it being sent, for example.

Presumably, the ServerTrace struct will need to define another handler: WillWriteBody(body []byte, header http.Header) []byte

Such a handler would need to receive the body before it's being written, as well as the headers, if there's a need to modify them. And it should return the new body for actual writing. I assume that the request headers are already somehow available from within that handler already, since they also usually play a part here.

rhysh commented 7 years ago

You make a good point @urandom , that packages like https://github.com/NYTimes/gziphandler are currently implemented by wrapping the ResponseWriter with special behavior for Write and Flush. If packages like that one continue to wrap the ResponseWriter, new features exposed as methods will continue to be shadowed.

That package also has special behavior for the Push method: it ensures that the synthetic request will have the "Accept-Encoding: gzip" header. And, the handler disables connection hijacking. It needs to intercept several methods in order to change their behavior.

Will the ServerTrace struct expand to allow modifying all additional ResponseWriter methods? When new behaviors are added to the ResponseWriter, will they be added to the ServerTrace struct as well? That seems like a lot of API duplication.

bradfitz commented 6 years ago

Looks like this didn't happen before the Go 1.10 freeze. Bumping to Go 1.11.

stapelberg commented 6 years ago

Another data point of a package which has to deal with the combinatorial explosion of optional interfaces: https://github.com/prometheus/client_golang/blob/1cdba8fdde81f444626530cce66e138d2f68ae1c/prometheus/promhttp/delegator.go#L100

keegancsmith commented 6 years ago

I have an alternative proposal for avoiding the combinatorial explosion of optional interfaces. It is inspired by the causer interface from github.com/pkg/errors (interface{ Cause() error }). The stdlib documents that if you wrap a http.ResponseWriter you should also implement this interface

type Wrapper interface {
    Wrap() http.ResponseWriter
}

Then for example, if we want to get at the http.Hijacker we can use code like this:

func GetHijacker(w http.ResponseWriter) http.Hijacker {
    for w != nil {
        if h, ok := w.(http.Hijacker); ok {
            return h
        }
        wrapper, ok := w.(http.Wrapper)
        if !ok {
            break
        }
        w = wrapper.Wrap()
    }
    return nil
}

This doesn't solve the problem of getting at additional server event data that ServerTrace would, but would help with the original problem proposed in this issue and the issues with wrapping ResponseWriter in general.