Customizing HTTP/2 abort response

JamesNK commented 5 years ago

In gRPC there is the concept of a deadline. The deadline is sent by the client to the server with a gRPC call in a header. It is the client saying to the server "you have X seconds to finish this call, after-which give up".

If the server finished the call in less than X sends then the response completes as usual.
If the server exceeds the X seconds then it immediately finishes the HTTP/2 response, sending a RST_STREAM. The user's gRPC method code will likely still be executing after this. If they are good then it will be using a cancellation token, and they stop whatever they're doing quickly. But they might not, and continue doing stuff after the response is complete. This is fine. It is also fine if an error eventually gets thrown because they tried to do an HTTP thing.

Today the gRPC server is finishing the connection when the deadline is hit by calling HttpContext.Abort(). This immediately sends RST_STREAM with an INTERNAL_ERROR error code. The request delegate with the user's code is still running. Using Abort is not ideal, but its response seems to be acceptable to the various gRPC clients of the world.

Today that response on Abort looks very hard-coded in Http2Stream:

protected override void ApplicationAbort()
{
    var abortReason = new ConnectionAbortedException(CoreStrings.ConnectionAbortedByApplication);
    ResetAndAbort(abortReason, Http2ErrorCode.INTERNAL_ERROR);
}

I think the current abort behavior is fine for gRPC to use with deadline, the one thing that could be improved is the abort response over HTTP/2. We're sending INTERNAL_ERROR, and no status trailers. Another gRPC implementation I looked at immediately ends the response, but it sends RST_STREAM (NO_ERROR), along with some HTTP/2 trailing headers.

It would be nice to have a hook in ASP.NET Core to modify the response when an HTTP/2 abort occurs, such as changing the Http2ErrorCode used with RST_STREAM, or appending trailers.

Thoughts around the area? Or alternatives?

Tratcher commented 5 years ago

it sends RST_STREAM (NO_ERROR), along with some HTTP/2 trailing headers.

How does it do that? It should only work if they send the trailers first as trailers after RST would be invalid.

It would be nice to have a hook in ASP.NET Core to modify the response when an HTTP/2 abort occurs, such as changing the Http2ErrorCode used with RST_STREAM, or appending trailers.

I could image having an HTTP Feature and response extension for providing a Reset method with a custom error code (and another one for GoAway). Also including trailers would be complicated.

You still have a multi-threading race condition here though. Only one thread is supposed to touch HttpContext, so calling Reset from a timer thread would not be safe.

JamesNK commented 5 years ago

How does it do that? It should only work if they send the trailers first as trailers after RST would be invalid.

HEADER frame first, then RST_STREAM

Only one thread is supposed to touch HttpContext

Does that include HttpContext.Abort? I don't mind if regular ASP.NET Core thread throws an error because abort has been called. How would you suggest implementing deadline functionality that I've described without another thread?

Tratcher commented 5 years ago

Does that include HttpContext.Abort? I don't mind if regular ASP.NET Core thread throws an error because abort has been called. How would you suggest implementing deadline functionality that I've described without another thread?

Yes, right now it does. Let's discuss the thread safety issue over on https://github.com/aspnet/AspNetCore/issues/9239#issuecomment-499164735.

halter73 commented 5 years ago

It would be nice to have a hook in ASP.NET Core to modify the response when an HTTP/2 abort occurs, such as changing the Http2ErrorCode used with RST_STREAM, or appending trailers.

Could we just add a new HTTP/2-specific feature that allows you to both abort the request and configure RST_STREAM all at once? Basically it would be a specially HTTP/2 version of support with extra parameters. Since you already control the timer callback that calls IHttpRequestLifetimeFeature.Abort(), I don't see the need for a callback-based API.

JamesNK commented 5 years ago

IHttp2AbortThingyFeature.Abort(Http2ErrorCode) could be enough. I want to double check whether returning trailers with a gRPC deadline is a nice to have or not.

Tratcher commented 5 years ago

RE: Trailers, sending them would be a protocol violation if the response had a content-length and you hadn't already finished the body.

JamesNK commented 5 years ago

Ok, I've refactored gRPC deadline logic so that it no longer uses a Timer plus HttpContext.Abort

PR here - https://github.com/grpc/grpc-dotnet/pull/301/files

There is now a separate path the server uses when there is a deadline. It produces a better response than HttpContext.Abort, but it allocates a lot more.

Before there was just a Timer. Allocations were 520B.
With refactor there is a CTS, Task.Delay, Task.WhenAny. Allocations are 1.09KB

I do not know whether this a good idea. It would be great to have input from experts in this area.

// @davidfowl @Tratcher @halter73

JamesNK commented 5 years ago

I don't think my refactor in gRPC will work. It results in holding onto the HttpContext/Http2Stream after the request pipeline has finished. Calling HttpContext.Abort is not a solution because trailers need to be written and they aren't when the request is aborted.

I think the solution is some changes in Kestrel and this issue's original request is the best way forward I can think of:

Add an option to customize the Http2ErrorCode that is sent on Abort
Add an option to write trailers on Abort (or maybe always writing trailers on abort makes sense in HTTP/2)

Good idea? Bad idea? Simple to do? Hard?

JamesNK commented 5 years ago

Alternatively, an option could be to use Response.BodyWriter.Complete() to send the trailers and RST_STREAM

https://github.com/aspnet/AspNetCore/issues/7370

halter73 commented 5 years ago

If BodyWriter.Complete() was used to send the trailers, that would result in an a frame with an end-of-stream flag. I'm not sure it's valid to send an RST stream after that.

Add an option to customize the Http2ErrorCode that is sent on Abort

This seems necessary.

Add an option to write trailers on Abort (or maybe always writing trailers on abort makes sense in HTTP/2)

@Tratcher and I were discussing this yesterday. This seems weird to us. What's the other gRPC implementation that does this?

JamesNK commented 5 years ago

If BodyWriter.Complete() was used to send the trailers, that would result in an a frame with an end-of-stream flag. I'm not sure it's valid to send an RST stream after that.

It is valid.

An HTTP response is complete after the server sends — or the client receives — a frame with the END_STREAM flag set (including any CONTINUATION frames needed to complete a header block). A server can send a complete response prior to the client sending an entire request if the response does not depend on any portion of the request that has not been sent and received. When this is true, a server MAY request that the client abort transmission of a request without error by sending a RST_STREAM with an error code of NO_ERROR after sending a complete response (i.e., a frame with the END_STREAM flag). Clients MUST NOT discard responses as a result of receiving such a RST_STREAM, though clients can always discard responses at their discretion for other reasons.

https://httpwg.org/specs/rfc7540.html#rfc.section.8.1

What's the other gRPC implementation that does this?

The primary gRPC implementation (C-core) does this. In this Wireshark trace of the deadline response from C-core you can see trailers followed by a RST_STREAM

JamesNK commented 5 years ago

Here are 3 scenarios around deadline, and what C-core does in each:

Server exceeds deadline without writing headers. Initial state is nothing has been written. On deadline C-core will send trailers and headers back together in one HEADERS frame. The frame has END_STREAM=true. RST_STREAM NO_ERROR frame is then sent.
Server exceeds deadline after writing headers. Initial state is HEADERS has been written. On deadline C-core will send trailers in a new HEADERS frame. The frame has END_STREAM=true. RST_STREAM NO_ERROR frame is then sent.
Server exceeds deadline after writing headers+message. Initial state is HEADERS and a DATA with the message have been written. On deadline C-core will send trailers in a new HEADERS frame. The frame has END_STREAM=true. RST_STREAM NO_ERROR frame is then sent.

Note that gRPC never uses content-length. content-length is not a problem for us but I understand that it something you will need to validate for other users.

JamesNK commented 5 years ago

Proposal (high-level, name and shape TBD):

CompleteAsync() - Like FlushAsync() but will force trailers to be sent. It will also cause the last frame to have END_STREAM=true set on it. The current Response.BodyWriter.Complete is not enough because we need to wait for it to finish. After CompleteAsync is finished then abort would be called to send RST_STREAM.
ResetHttp2Stream(Http2ErrorCode) - Like HttpContext.Abort but lets you specify the error code sent with RST_STREAM frame.

How gRPC would use these in the scenarios mentioned above:

gRPC would add the status trailers to the headers collection, call CompleteAsync (this would cause the HEADERS frame to have END_STREAM=true set on it), then call abort. User code would be awaited.
gRPC would add the status trailers to the trailers collection, call CompleteAsync (this would cause the HEADERS frame to have END_STREAM=true set on it), then call abort. User code would be awaited.
Same as 2.

private Timer DeadlineTimer = new System.Threading.Timer(DeadlineExceeded, DeadlineDuration);

public async Task DeadlineExceeded()
{
    lock (DeadlineLock) // SemaphoreSlim?
    {
        // Set DeadlineExpired status
        HttpContext.Response.ConsolidateTrailers(this);

        // Send remaining content, END_STREAM=true. No further content can be written
        // Returns once content and trailers have been sent
        await HttpContext.Feature.Get<ICompleteThingFeature>().CompleteAsync();

        // Send RST_STREAM (NO_ERROR)
        HttpContext.Feature.Get<IResetThingFeature>().ResetHttp2Stream(Http2ErrorCode.NoError);    
    }
}

Tratcher commented 5 years ago

Something else that came up when James and I discussed this is that the GRPC handler will manage the synchronization such that the timeout code path and the normal response code path are never trying to write headers, trailers or body at the same time. These are all discrete operations that happen within the handler and not out in user code.

Tratcher commented 5 years ago

Note: CompleteAsync isn't HTTP/2 specific, it could equally apply for HTTP/1.1 chunked responses with or without trailers.

For Content-Length responses it would throw if the exact content-length was not already sent. (HTTP/1 and 2)
Assuming we provide an HttpResponse extension method wrapper for this feature, what should it do if the server does not implement CompleteAsync? FlushAsync makes a poor fallback. Throw NotSupportedException?

What do the GRPC components do in general if run on a server other than Kestrel?

davidfowl commented 5 years ago

Assuming we provide an HttpResponse extension method wrapper for this feature, what should it do if the server does not implement CompleteAsync? FlushAsync makes a poor fallback. Throw NotSupportedException?

That question makes it unfit to go onto HttpResponse.

Tratcher commented 5 years ago

That question makes it unfit to go onto HttpResponse.

Perhaps, it depends on the answer.

JamesNK commented 5 years ago

What do the GRPC components do in general if run on a server other than Kestrel?

Functional tests run in TestServer. When gRPC uses an API that isn't properly supported in TestServer, e.g. IResponseTrailersFeature, then I improve TestServer. I'll do the same with the new feature that comes out of this issue.

halter73 commented 5 years ago

Assuming we provide an HttpResponse extension method wrapper for this feature, what should it do if the server does not implement CompleteAsync? FlushAsync makes a poor fallback. Throw NotSupportedException?

That question makes it unfit to go onto HttpResponse.

You could fall back to an HttpResponse.CompleteAsync() that just calls FlushAsync() and then Complete() on the response body PipeWriter. It would still result in unnecessary empty END_STREAM frames, but it's not exactly wrong.

But what if instead we added an API similar to the proposed AbortHttp2Stream, but instead of aborting the stream immediately, it buffered the RST frame after the response body and trailers if any?

Tratcher commented 5 years ago

@JamesNK I was referring more to IIS or HttpSys. Does gRPC fail or fall back gracefully to other mechanisms?

JamesNK commented 5 years ago

Do they support HTTP/2 and trailers today? We haven't tested much outside of Kestrel/TestServer. I don't know.

For this feature, what I could do is check to see whether the feature is in HttpContext.Features. If it is then CompleteAsync/AbortHttp2Stream could be used. If it is not then the gRPC would fallback to HttpContext.Abort.

Tratcher commented 5 years ago

@JamesNK they do support HTTP/2 but not trailers.

JamesNK commented 5 years ago

gRPC would fail on IIS when it calls HttpResponse.AppendTrailer (unless IIS has a dummy IResponseTrailersFeature that no-ops). That's fine. gRPC doesn't work without trailers.

JamesNK commented 5 years ago

What is the next step here? I can organize a meeting to come up with a design, or you could guys could informally talk about it and come up with one that I double check.

Up to you 🤷‍♂

davidfowl commented 5 years ago

@JamesNK setup a design meeting so we can flesh this out this week?

Tratcher commented 5 years ago

@anurse We went over the high level design and think this qualifies for preview7. I have some cycles and think I can make incremental progress on it.

Two new features/APIs: IHttpResponseCompletionFeature.CompleteAsync(); Flushes the response body and trailers. This is considered a graceful end of the request. For HTTP/1.1 Chunked this would send the terminator. For Content-Length (HTTP/1 or 2) it throws if the exact content-length has not yet been written. For HTTP/2 it sends END_STREAM on the final data or headers frame. If the response has not started / there is no body or trailers, that final END_STREAM may be on the initial headers frame. Future writes will fail silently (?). We could scope this to HTTP/2 for time reasons. ResponseCompression would need to wrap this feature in order to flush the compression stream.

IHttp2ResetFeature.Reset(int errorCode); Aborts the request and response and sends a RESET frame with the given error code. This is functionally the same as Abort but with a specific code to send to the client.

From the discussion with @JamesNK, the lack of CompleteAsync is causing interop issues with at least one client. The Reset behavior is recommended by spec but doesn't appear to be causing iterop issues at the moment. It's the easy feature but is also optional.

analogrelay commented 5 years ago

IHttp2ResetFeature.Reset(int errorCode);

Rather than the Http2 in the name, could it just not be present in HTTP/1.1 feature collections? I mean, in theory it may also apply to HTTP/3 right? Then it could be IHttpResetFeature.

We could scope this to HTTP/2 for time reasons.

Again, if we do scope it (and I think at least starting with HTTP/2 only makes sense even if we think we can get it in for both in 3.0) we should just not install the feature in an HTTP/1.1 collection.

Tratcher commented 5 years ago

HTTP/3 doesn't have Reset itself, that's a QUIC feature. I could just call it IResetFeature.

halter73 commented 5 years ago

IHttp2ResetFeature.Reset(int errorCode)

What the int errorCode parameter means is specified by the HTTP/2 spec. I much prefer keeping "Http2" in the feature name.

Tratcher commented 5 years ago

@halter73 except it's an open set, you could send custom error codes, or quick error codes. It's only HTTP/2 specific if we provide an enum with the spec values.

halter73 commented 5 years ago

I guess the question is then do we expect to use this feature for any other protocol that happens to support integer error codes in some sort of reset frame/message. I don't find this super likely, and I like the somewhat self-documenting nature of including "Http2" in the feature interface name.

Just removing it from the HTTP/1 feature collection doesn't really help. It doesn't affect the discoverability of the feature at all unless you happen to be looking at the feature collection in a debugger, so I think having a name indicating this is a feature for HTTP/2 will help avoid confusion.

Tratcher commented 5 years ago

CompleteAsync is turning out much easier than I was expecting. I should have a PR by the end of the day. One side effect that we did not discuss: RequestAborted will no longer fire after calling CompleteAsync.

This logic was already in place for things like ContentLength responses so you wouldn't get aborted for graceful disconnects after you've sent the full response. CompleteAsync is another way of saying you've sent the whole response.

This conflicts with the gRPC scenario that wants to complete the response and then abort. Abort will still send a reset to the client, but RequestAborted does not fire. Any ongoing request body reads fail with OperationCancelledException.

Option A) Do nothing in Kestrel. gRPC should add their own chained token to notify the application when gRPC is aborting the response. This would integrate reasonably well with the timeout. Option B) Keep the cancellation token alive until the end of the Request AND Response bodies. This will be complicated and still might not be enough for gRPC in some modes.

halter73 commented 5 years ago

Option A) Do nothing in Kestrel. gRPC should add their own chained token to notify the application when gRPC is aborting the response. This would integrate reasonably well with the timeout. Option B) Keep the cancellation token alive until the end of the Request AND Response bodies. This will be complicated and still might not be enough for gRPC in some modes.

I also think I prefer option A. After all, from Kestrel's perspective, if you call CompleteAsync() was the request really aborted? I agree this shouldn't be treated differently than writing out the full content-length response.

JamesNK commented 5 years ago

With option A there will be a performance penalty in gRPC because we'll have to create our own CTS rather than reusing Kestrel's.

I don't think this is a big deal. It is pay-for-play: it will only affect perf when using a deadline.

Tratcher commented 5 years ago

You already have the overhead of a timer, you could replace that with the CTS and likely not add any additional overhead.

JamesNK commented 5 years ago

Maybe. We need to be able to distinguish between the request being aborted and deadline being exceeded. Might be able to do that with a single CTS - I'll investigate when switching to use CompleteAsync.

Tratcher commented 5 years ago

Picking up from https://github.com/aspnet/AspNetCore/issues/10886#issuecomment-501811008

Looking closer at QUIC, it defines the RST_STREAM frame, but not the individual error code values.

the management of application error codes are left to application protocols.

The actual error codes are defined in the HTTP/3 spec. https://tools.ietf.org/html/draft-ietf-quic-http-20#section-8.1

In this case a more generic reset makes sense, the values are disconnected from the functionality.

halter73 commented 5 years ago

If you think it's generic enough, I'm fine with IHttpResetFeature instead of IHttp2ResetFeature

analogrelay commented 5 years ago

Is this done now that #11193 and #11300 are merged?

Tratcher commented 5 years ago

I want to add the same features to TestServer quick.

Tratcher commented 5 years ago

Closing this out. The TestServer work is going to be more involved than I'd thought.

JamesNK commented 5 years ago

Is there an issue for test server implementation?

Tratcher commented 5 years ago

@JamesNK https://github.com/aspnet/AspNetCore/issues/11598

I'm marking this as Accepted based on gRPC's consumption.

dotnet / aspnetcore

Customizing HTTP/2 abort response #10886