proposal: net/http, net/http/httptrace: add mechanism for tracing request serving

CAFxX commented 4 years ago

For the purpose of accurate request tracing when serving HTTP requests (for both HTTP/1.x and HTTP/2), it would be ideal if http.Handlers could obtain a time.Time representing the time of first byte of the request.

In this context, time of first byte would be the best estimate of the instant when the go process reads the first byte of the HTTP request line from the socket (i.e. ignoring the time spent in hardware queues and in kernel buffers).

Between the time of first byte and the time the first http.Handler is invoked for that request quite some time can pass since in that interval all headers must be received (and parsed) and the request body must start. If the client is slow then this interval can be extremely long.

Furthermore, when using the middleware pattern for tracing, the order of middleware can influence the recorded time at which the request starts to be handled. This can skew metrics and the accuracy of tracing, leading to hard-to-debug performance issues.

With this proposal implemented, all of the above issues would be solved:

request parsing would be correctly accounted for
slow clients can be correctly accounted for
request start time can be reliably detected irrespective of middleware order

This information could be included either in the http.Request struct, returned by a method on http.Request, or attached to the request context.Context (perhaps via something like the httptrace.ServerTrace suggestion below by @odeke-em).

update: rephrased according to @odeke-em's suggestions

odeke-em commented 4 years ago

Thank you for filing this issue @CAFxX!

So in regards to Transports and HTTP clients, have you perhaps encountered httptrace.ClientTrace https://golang.org/pkg/net/http/httptrace/#ClientTrace

It provides hooks for a bunch of events like: a) WroteRequest b) WroteHeaders c) GotFirstResponseByte

Direction

You talk about

In this context, time of first byte would be the best estimate of the time at which the go process reads the first byte of the HTTP request line (i.e. ignoring the time spent in hardware queues and the socket buffers). ... Between the first byte to when the first http.Handler is invoked quite some time can pass, since in that interval all headers must be received (and parsed) and the request body must start. If the client is slow then this interval can be extremely long. ... This information could be included either in the http.Request struct, returned by a method on http.Request, or attached to the context.Context.

It seems to me that this issue is a feature request for http.Server to attach timing information for processing of requests? If so, let’s modify the title and the wording to make this clear and that’ll perhaps aid in proposal review.

Proposal

Building off your suggestions and comments, perhaps: This issue perhaps could be solved if net/http/httptrace had: a) func ContextServerTrace(context.Context) *ServerTrace b) A net/http/httptrace.ServerTrace struct

type ServerTrace struct {
       TimeToFirstRequestByte time.Duration
       BufferSize int64
       TimeToFirstHeader time.Duration
       ...
}

If perhaps this suits the requirements, we can make the issue title “proposal: net/http, net/http/httptrace: add mechanism for tracing request serving”

seankhliao commented 4 years ago

Earlier httptrace ServerTrace proposal #18997

CAFxX commented 4 years ago

@odeke-em yes, your interpretation is absolutely correct (sorry, I jotted it down in a haste, and clarity was definitely lacking). I changed the title and tried to clarify things a bit.

I also incorporated a pointer to your suggested interface. I just have a couple of question about it though:

what would the semantics of the time.Duration in this case? What would be the start and end instants of these two durations? Put otherwise: how can I obtain a time.Time representing the instant in time when the corresponding events occurred?
what's the BufferSize for?

odeke-em commented 4 years ago

what would the semantics of the time.Duration in this case? What would be the start and end instants of these two durations? Put otherwise: how can I obtain a time.Time representing the instant in time when the corresponding events occurred?

Great question, so I was thinking that each timing would be an event offset from an origin time aka time the request was first received. We could have the origin time e.g. ReceivedRequestAt time.Time or RequestStartTime time.Time, instead of having both ReceivedRequestAt time.Time and TimeToFirstRequestByte time.Time. Offsets from a single origin time keep the data in the form of an event log and reduce unnecessary computations needed by users to detail the story of the request.

what's the BufferSize for?

BufferSize is meant to capture how much data was needed to be buffered before a full response could be read and the handler was initiated, for example it could be the length of the initial data stream for HTTP/2.

Earlier httptrace ServerTrace proposal #18997

Thank you @seankhliao for finding that ServerTrace proposal. @CAFxX the proposal that @seankhliao referred to has been open for a while and has even more discussion and context. Perhaps let's close this one and transfer the discussion there?

CAFxX commented 4 years ago

@CAFxX the proposal that @seankhliao referred to has been open for a while and has even more discussion and context. Perhaps let's close this one and transfer the discussion there?

After going through #18997 I am not sure about whether that's sensible: the proposal scope in #18997 is significantly wider, and as a result the design proposed there has run into problems (circular dependencies between net/http and net/http/httputil) that seemingly caused that proposal to get hopelessly stuck and see no significant activity for the last 2.5 years.

Most likely this proposal could be implemented on the framework of #18997, but as that proposal seem to be not going anywhere due to scope, I don't think increasing the scope further will be of benefit to either of them.

Note that this proposal does not necessarily require the specific solution of #18997 (httputil.ServerTrace): there are other pragmatic solutions mentioned above.

So, if possible, I'd vote to keep the two proposals separate. I can definitely mention the use case in #18997, but merging seems unnecessary.

gopherbot commented 4 years ago

Change https://golang.org/cl/227438 mentions this issue: net/http: prototype ServerTraceState to capture state of a request being handled

odeke-em commented 4 years ago

Gotcha, thank you! I couldn't sleep so I instead prototyped something with CL https://go-review.googlesource.com/c/go/+/227438, please take a look. From the ideas I was able to spin up, we'll have code such as:

Code

func main() {
        cst := httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
                ts := r.ServerTraceState()
                fmt.Printf("Request was received at: %s.\n", ts.ReceivedAt)
                fmt.Printf("Header processing started after: %s.\nFinished after: %s.\nTotal time to parse headers: %s.\n\n",
                        ts.ParsedHdrsStartAt.Sub(ts.ReceivedAt),
                        ts.ParsedHdrsEndAt.Sub(ts.ReceivedAt),
                        ts.ParsedHdrsEndAt.Sub(ts.ParsedHdrsStartAt))
                fmt.Printf("TLS handshaking started after: %s.\nTime for TLS handshake: %s\n\n",
                        ts.TLSHandshakeStartAt.Sub(ts.ReceivedAt),
                        ts.TLSHandshakeEndAt.Sub(ts.TLSHandshakeStartAt))
                fmt.Printf("Reading the first byte started after: %s.\nTime to read first byte: %s.\n\n",
                        ts.FirstByteStartAt.Sub(ts.ReceivedAt),
                        ts.FirstByteEndAt.Sub(ts.FirstByteStartAt))
                fmt.Printf("Composing the request before passing it into this handler took: %s.\n",
                        ts.ParsedRequestAt.Sub(ts.ReceivedAt))
        }))
        defer cst.Close()

        req, _ := http.NewRequest("GET", cst.URL, nil)
        res, err := cst.Client().Do(req)
        if err != nil {
                panic(err)
        }
        _, _ = ioutil.ReadAll(res.Body)
        _ = res.Body.Close()
}

Output

Request was received at: 2020-04-07 03:20:46.320637 -0700 PDT m=+0.001848575.
Header processing started after: 1.331644ms.
Finished after: 1.337667ms.
Total time to parse headers: 6.023µs.

TLS handshaking started after: 14.448µs.
Time for TLS handshake: 1.274575ms

Reading the first byte started after: 1.310963ms.
Time to read first byte: 13.904µs.

Composing the request before passing it into this handler took: 1.344004ms.

It is meant to show how this can be done and as a starting point to further the proposal and discussions.

CAFxX commented 4 years ago

Functionally (w.r.t. the problem described in this issue) LGTM. This is quite more than I would expect from an MVP. :D

While I'm not sure whether we need all that granularity right away, looking at a full-blown prototype like this is very useful though:

I don't have strong feelings about whether this should be always-on/opt-in/opt-out, even though maybe opt-out would be best (rationale: good functional default that allows users that don't care about tracing to not pay the extra cost).
As a future extension, since IIRC the request body can be read asynchronously in the background, it may make sense to have also the timestamp of when the last byte of the request body is received (absolutely out of scope for now, also because it would require synchronization with the handler)
This is not a particularly strong argument for instants instead of durations, but anyway: instants have a better zero value to signify "this event did not happen", e.g. TLS handshake for non-TLS connections (while unlikely, a duration of 0 could mean both that the event did not happen, or that it happened very quickly)
Given points (2) and (3), maybe it would be more forward-looking to provide "getters" instead of exposing the timestamps/durations directly? (this would allow e.g. to safely implement synchronization) -- update: ah maybe this is not required since ServerTraceState() is already getter-like

odeke-em commented 4 years ago

Thank you for the feedback, and for taking a look, @CAFxX!

In response to your points:

A setting on http.Server before creation perhaps, but also they are just timers so not sure about the overhead except via allocations
The last byte of the request is impractical because these is telemetry that will be available right before, and within the handler i.e. request bodies can still be streaming until the very last byte is read, and at that point this mechanism will be out of scope
Right, so that's what I deduced too as I was thinking of a design. Another problem with durations is that when you try to calculate offsets since each other e.g. when TLS handshaking finished, you have to add ts.ReceivedAt.Add(ts.TLSHandshakeStartAt + ts.TLSHandshakeEndAt)
I had initially thought about making it an interface which is why you'll traceState and ServerTraceState but really given that it'll be available for use in a handler and won't be modified anymore, the use of getters just unnecessarily complicates it

CAFxX commented 4 years ago

The last byte of the request is impractical because these is telemetry that will be available right before, and within the handler i.e. request bodies can still be streaming until the very last byte is read, and at that point this mechanism will be out of scope

I was thinking about this in the context of a tracing middleware, so you could do something like

func TracingHandler(next http.Handler) http.Handler {
    return http.HandlerFunc(func (w http.ResponseWriter, r *http.Request) {
        w, r, span := traceRequest(w, r)
        defer span.Finish()
        next.ServeHTTP(w, r)
    })
}

by the time span.Finish() runs r is still valid and we should be in one of these cases:

the handler chain has completed processing the request, and processed the request body (if any)
the handler chain has completed processing the request, but ignored the request body (if any)
the handler chain aborted

In case 1, the time of last byte most likely has already occurred.

In the other cases it's hard to say whether the body has already been fully read, but knowing the time of last byte (if it has not occurred yet) likely does not matter.

If we really wanted to cover all of these cases cleanly, we would probably need a Context-like Done channel of some sort (or a waitForCompletion argument to ServerTraceState) that would be unblocked when net/http has completed handling that request (this way, span.Finish could start a goroutine that would block until the request is complete, and then get the complete TraceState). Would be nice to leave the option open for further extensions, but we definitely don't need all of this right now.

rsc commented 4 years ago

Talked with @bradfitz about this.

It does seem like we should provide an httptrace.ServerTrace analogous to httptrace.ClientTrace and provide a way to hook it to the server.

ClientTrace is a special context value on the client side. Now that http.Server has BaseContext and ConnContext, the context could be queried for a ServerTrace.

The ServerTrace should be like ClientTrace, though: instead of holding fields about one connection, it should hold callback funcs.

Does anyone want to propose a good, small, starting set of ServerTrace callbacks?

CAFxX commented 4 years ago

I am trying to picture how that would compose cleanly but coming up empty. What if two middlewares/libraries want to install the same callback? Would multiple ServerTrace be supported?

seankhliao commented 4 years ago

Is that common though? It could work the same way http handler chaining works, you take the existing callback and call it before/after your callback

s := &ServerTrace{
        TracePoint1: someCallback,
}
s.TracePoint1 = yourCallback(s.TracePoint1)

func yourCallback(existingCallback func()) func() {
        return func(){
                // do stuff
                if existingCallback != nil {
                        existingCallback()
                }
        }
}

CAFxX commented 4 years ago

That would be way less than ideal if what you care about is timing specific events (the original goal of this proposal) since with this approach timing depends on the order of the callbacks (what we want to avoid):

Furthermore, when using the middleware pattern for tracing, the order of middleware can influence the recorded time at which the request starts to be handled. This can skew metrics and the accuracy of tracing, leading to hard-to-debug performance issues.

I absolutely understand the ServerTrace approach would enable many more use cases; what I would like to avoid is not covering the timing use case. On this point, it was already mentioned that this proposal should maybe be merged in the ServerTrace one, but as pointed out at the time none of the two proposals (as they stand) covers all use cases of the other.

rsc commented 4 years ago

If you are middleware, I am assuming you implement http.Handler. Given a w, r you should be able to do:

r = r.WithContext(httptrace.WithServerTrace(ctx, trace))
dispatch(w, r)

Today, httptrace.WithClientTrace takes care of hooking different tracers together. Presumably WithServerTrace would do the same.

It sounds like what this proposal needs now is just a preliminary definition for httptrace.ServerTrace, analogous to httptrace.ClientTrace, the bare minimum to get started.

rsc commented 4 years ago

Still sounds like what this proposal needs now is just a preliminary definition for httptrace.ServerTrace, analogous to httptrace.ClientTrace, the bare minimum to get started.

Is anyone interested in sketching that out? Should we put the proposal on hold until someone can do that?

rsc commented 4 years ago

Putting proposal on hold for a sketch of httptrace.ServerTrace.

rhysh commented 4 years ago

GotFirstRequestByte or similar will be an important tracing point. Connecting that event to the forthcoming http.Request value looks like one of the main API challenges. There's a start at that via http.Server.ConnState, but it relies on HTTP/1.x (and no pipelining). And, would instrumentation that acts only at the http.Handler level have no way to access GotFirstRequestByte events after the fact?

Middleware that works at the http.Handler level should have a way to subscribe to future events (such as WroteStatus(code int)). If the code to do that is r = r.WithContext(httptrace.WithServerTrace(ctx, trace)), and if we cannot require that dispatch(w, r) examine r.Context, then it looks like that pair of Context-changing calls would need to have side-effects. This is the other significant API challenge I see.

Here's what the struct might look like, provided we're able to address those challenges:

package httptrace // import net/http/httptrace

type ServerTrace struct {
  GotFirstRequestByte func(req *ServerTraceRequestKey)
  GotHeaders func(req *ServerTraceRequestKey)
  GotRequestBody func(req *ServerTraceRequestKey, length int64)
  GotTrailers func(req *ServerTraceRequestKey)
  WroteStatus func(req *ServerTraceRequestKey, code int)
  WroteHeaders func(req *ServerTraceRequestKey)
  WroteBody func(req *ServerTraceRequestKey, length int64)
  WroteTrailers func(req *ServerTraceRequestKey)
}

type ServerTraceRequestKey {
  // Maybe this struct doesn't need to hold any information,
  // and only needs to have a unique address.
  _ byte

  // Or maybe it could hold uniqueness in its fields, such as
  // the remote Addr or net.Conn plus HTTP/2 stream ID.
}

Plus a way to access the ServerTraceRequestKey once the Request is available:

package http // import net/http

// RequestKeyContextKey is a context key.
// The associated value will be of type *httptrace.ServerTraceRequestKey
var RequestKeyContextKey = &contextKey{"request-key"}

maxmoehl commented 5 months ago

Another hook I'd be interested in is to access the outcome of a TLS handshake. I was briefly considering a callback in the tls.Config since there are already some callbacks in there. But after reading this and the preceding proposal it probably makes more sense to solve this as part of a httptrace.ServerTrace since the tls package gives access to all data and the issue only stems from the fact that the http.Server handles such errors internally (but in it's defence: there isn't anything else it could do).

I'd propose to add:

    // TLSHandshakeStart is called when the TLS handshake is started. When
    TLSHandshakeStart func()

    // TLSHandshakeDone is called after the TLS handshake with the handshake's
    // connection state and a non-nil error on handshake failure.
    TLSHandshakeDone func(tls.ConnectionState, error)

to the trace (based on the callbacks from the ClientTrace with the same name). Note the difference in the doc string for the HandshakeDone callback: it always provides the connection state as opposed to only on success.

@rhysh do you think it is really necessary to provide a key like ServerTraceRequestKey? I do see the challenge of correlating the data but for a server we could use the request itself which is created here (for HTTP/1 at least). It seems like this function is also doing the first read so it should be good for GotFirstRequestByte as well.

golang / go