Implement trailing headers support on SocketsHttpHandler

caesar-chen commented 5 years ago

gRPC (Google RPC) protocol uses HTTP trailers to send/receive metadata. On the client side, SocketsHttpHandler needs to implement support for trailing headers to receive metadata.

Client:
GET /index.html
TE: trailers (Indicates that the client is willing to accept trailer fields in a chunked transfer coding)

Server:
HTTP/1.1 200 OK 
Content-Type: text/plain 
Transfer-Encoding: chunked
Trailer: Expires (Indicates what will be the additional fields)

7\r\n 
Network\r\n 
0\r\n 
Expires: Wed, 21 Oct 2015 07:28:00 GMT\r\n (Here is the trailing header)
\r\n

API Proposal

To support trailing headers come with the response message, we need a new Property in HttpResponseMessage class.

public HttpResponseHeaders TrailingHeaders
{
    get
    {
        if (_trailingHeaders == null)
        {
              _trailingHeaders = new HttpResponseHeaders();
        }
        return _trailingHeaders;
    }
}

Usage

Trailers are part of the response body (chunked message), the information stored in TrailingHeaders field will be ready for developer after the underneath response stream is read to the EOF.

By default, the HttpCompletionOption for response is ResponseContentRead, which indicates HttpClient operations (Get/Send) will be considered completed after reading the entire response message including the content. Therefore, we will always have the trailing headers information ready in this case.

However, if developer wants to get response as soon as it's available (HttpCompletionOption.ResponseHeadersRead), he needs to issue an explicit read on the response stream to the EOF to get the trailing headers. If the TrailingHeaders is not ready, we will return an empty collection.

Sample Code

// Response contents:

// Headers.
// Content-Type: text/plain
// Transfer-Encoding: chunked
// Trailer: MyCoolTrailerHeader, Hello

// Body.
// Microsoft
// gRPC

// Trailing Headers.
// MyCoolTrailerHeader: info
// Hello: World

HttpClientHandler handler = new SocketsHttpHandler();
var client = new HttpClient(handler);

// CASE 1: Default case.
Task<HttpResponseMessage> getResponseTask = client.GetAsync(url);
using (HttpResponseMessage response = await getResponseTask)
{
    // Headers will always be available for a response message.
    Assert.Contains("text/plain", response.Headers.GetValues("Content-Type"));
    Assert.Contains("chunked", response.Headers.GetValues("Transfer-Encoding"));
    Assert.Contains("MyCoolTrailerHeader", response.Headers.GetValues("Trailer"));
    Assert.Contains("Hello", response.Headers.GetValues("Trailer"));

    // The HttpClient.GetAsync() by default passed in HttpCompletionOption.ResponseContentRead.
    // `TrailingHeaders` info will be ready when response is returned.
    Assert.Contains("info", response.TrailingHeaders.GetValues("MyCoolTrailerHeader"));
    Assert.Contains("World", response.TrailingHeaders.GetValues("Hello"));

    // Trailers should not be part of the content data.
    string data = await response.Content.ReadAsStringAsync();
    Assert.Contains("Microsoft", data);
    Assert.DoesNotContain("MyCoolTrailerHeader", data);
    Assert.DoesNotContain("amazingtrailer", data);
}

// CASE 2: If HttpCompletionOption.ResponseHeadersRead is specified.
Task<HttpResponseMessage> getResponseTask = client.GetAsync(url, HttpCompletionOption.ResponseHeadersRead);
using (HttpResponseMessage response = await getResponseTask)
{
    // Headers will always be available for a response message.
    Assert.Contains("text/plain", response.Headers.GetValues("Content-Type"));
    Assert.Contains("chunked", response.Headers.GetValues("Transfer-Encoding"));
    Assert.Contains("MyCoolTrailerHeader", response.Headers.GetValues("Trailer"));
    Assert.Contains("Hello", response.Headers.GetValues("Trailer"));

    // Nothing in `TrailingHeaders` since we haven't read the body yet.
    Assert.Equals(string.Empty, response.TrailingHeaders.toString());

    // Read the body content: ReadAsStringAsync().
    // Trailers should not be part of the content data.
    string data = await response.Content.ReadAsStringAsync();
    Assert.Contains("Microsoft", data);
    Assert.DoesNotContain("MyCoolTrailerHeader", data);
    Assert.DoesNotContain("amazingtrailer", data);

    // Now `TrailingHeaders` info is ready.
    Assert.Contains("info", response.TrailingHeaders.GetValues("MyCoolTrailerHeader"));
    Assert.Contains("World", response.TrailingHeaders.GetValues("Hello"));
}

HTTP2

The TE header on the request can only contains trailers value. (RFC 7540)

The only exception to this is the TE header field, which MAY be present in an HTTP/2 request; when it is, it MUST NOT contain any value other than "trailers".

No additional API is needed for HTTP2.

cc: @dotnet/ncl @geoffkizer @stephentoub

davidfowl commented 5 years ago

cc @Tratcher

rmkerr commented 5 years ago

+@shirhatti

Tratcher commented 5 years ago

@JunTaoLuo isn't GRPC primarily interested in HTTP/2 trailers?

Comparing Case 1 and Case 2 calls out a hidden state field devs may need access to: weather or not a given instance of HttpContent has been read/buffered. If I examine TrailingHeaders and find it empty it's ambiguous to me if that's because the response is incomplete or if there were no trailers. HttpContent needs an API like bool IsComplete { get; }.

HttpResponseMessage.TrailingHeaders vs HttpContent.TrailingHeaders? Trailers are more often descriptions of the content which is why they can't be generated in advance. Consider if I'm writing APIs that consume the HttpContent and then need to verify a hash from the trailers like Content-MD5? There's no back-reference from HttpContent to HttpResponseMessage.

I anticipate a similar scenario for generating request trailers based on the content. We should have an idea for how request trailers would work even if we don't implement them at the same time. I'd expect the contract to be that request trailers must be provided before HttpContent.SerializeToStreamAsync completes, and will be sent immediately afterwards.

JunTaoLuo commented 5 years ago

@Tratcher yes we need HTTP/2 trailers in the response for gRPC, as mentioned by @caesar1995.

caesar-chen commented 5 years ago

I can work on HTTP/2 implementation first. It's similar - the HEADERS frame starting the trailers header block is sent after DATA frames have been sent.

HttpResponseMessage.TrailingHeaders vs HttpContent.TrailingHeaders?

According to RFC 7230:

When a chunked message containing a non-empty trailer is received, the recipient MAY process the fields (aside from those forbidden above) as if they were appended to the message's header section.

I think HttpResponseMessage.TrailingHeaders is more appropriate. But your example is interesting, I'd like to hear more people's feedback on this.

I anticipate a similar scenario for generating request trailers based on the content.

Let me double check the gRPC spec to make sure if we need this on client side.

JunTaoLuo commented 5 years ago

we don't need request trailers for the work that's currently planned for gRPC.

karelz commented 5 years ago

That's good to know for scoping, but if possible, I'd like to see API design for both request and response. They should be consistent. Then we can decide to implement just half of it (the response trailing headers) in 3.0.

caesar-chen commented 5 years ago

Per offline discussion today:

We should use HttpResponseMessage.TrailingHeaders. In order to get HttpContent, developer will always get HttpResponseMessage first.
Request trailing header support doesn't has customer report nor required by gRPC, and API design is different. The decision is that we will only implement response trailing headers in 3.0.

The API review is scheduled for next Tuesday. In the meantime, I will follow up with WinHttp team about stream buffering in HTTP2 - to explore the possibility that always have TrailingHeaders information ready before returning the HttpResponseMessage.

rmkerr commented 5 years ago

I took a look at the relevant code, and I think I can actually answer your question about stream buffering. We currently maintain a per-stream response buffer in Http2Stream: https://github.com/dotnet/corefx/blob/ee57e77c81a18ced855685f519b521e6a34d9821/src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Stream.cs#L50

This keeps individual streams from being able to block the whole connection if the consumer isn't actively reading data. We won't block the whole connection until we hit the window size limit. If we get to that point, we expect a well behaved server to stop sending data. If the server doesn't respect that limit, we'll respond with a fatal error and close the connection: https://github.com/dotnet/corefx/blob/ee57e77c81a18ced855685f519b521e6a34d9821/src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/Http2Stream.cs#L174-L182

Tratcher commented 5 years ago

to explore the possibility that always have TrailingHeaders information ready before returning the HttpResponseMessage.

@caesar1995 That's a non-starter for streaming scenarios. Http2 requires some per-stream buffering to multiplex, but not full buffering.

Tratcher commented 5 years ago

What about bool IsComplete { get; }?

rmkerr commented 5 years ago

Agreed with @Tratcher on that one. Our per-stream buffering means that it is possible for us to have read the trailing headers before the client has read the content on the stream, but in the case of a large response that fills the window the user will always have to read some of the content before we can receive the trailing headers.

What about bool IsComplete { get; }?

I feel like the bool IsComplete { get; } API would encourage developers to just sit on a thread checking the value until it is true, and I don't think that's a good pattern.

Tratcher commented 5 years ago

@rmkerr how do you differentiate between "the trailers haven't arrived yet" and "there are no trailers"? Or do you assume the component that consumes the content is always going to be the one that reads the trailers?

You are right to worry about the trailers showing up early. At a minimum there's a concurrency issue if you try to populate them on the background connection thread and check for them on the app thread. That lazy initalizer could get you in trouble.

rmkerr commented 5 years ago

That's a good question, and I'm not actually sure what the answer is.

@caesar1995, do you know how we would handle a server sending the Trailers header, and then not actually including any trailers? Is that even a realistic concern?

Tratcher commented 5 years ago

The Trailers response header is only a hint, there's nothing stopping the server from sending any trailers it wants.

caesar-chen commented 5 years ago

@caesar1995, do you know how we would handle a server sending the Trailers header, and then not actually including any trailers? Is that even a realistic concern?

Server sends Trailer header to inform the recipient to prepare for the metadata. If it sends the Trailer header and doesn't including any trailers, we should treat the response as malformed and TrailingHeaders field should just return an empty HttpResponseHeaders collection.

caesar-chen commented 5 years ago

The Trailers response header is only a hint, there's nothing stopping the server from sending any trailers it wants.

I think in this case, we should ignore any unexpected trailing headers - just like ignoring all disallowed trailers. https://tools.ietf.org/html/rfc7230#section-4.1.2

caesar-chen commented 5 years ago

What about bool IsComplete { get; }? how do you differentiate between "the trailers haven't arrived yet" and "there are no trailers"? Or do you assume the component that consumes the content is always going to be the one that reads the trailers?

The component that consumes/buffers the response content is always the one reads(knows) about the trailers information. I don't think we need to differentiate these two cases - by default we return the HttpResponseMessage after parsing & storing the trailing headers. If developers want to do that themselves (by setting completion option to ResponseHeadersRead), and they decided to dispose the content before fully read it, the TrailingHeaders field should just be empty. - We definitely need to document this behavior.

Tratcher commented 5 years ago

The Trailer header is only a SHOULD.

The client should be loose on accepting trailers other than those explicitly prohibited, regardless of the contents of the Trailer header. Cross checking with the Trailers header would be more work on your part, not less.

If the server sends a Trailer header and does not send the associated trailer, oh well, the app can deal with the missing value.

caesar-chen commented 5 years ago

The Trailer header is only a SHOULD. The client should be loose on accepting trailers other than those explicitly prohibited, regardless of the contents of the Trailer header.

This makes sense. Also the algorithm proposed by the RFC inline with your approach as well. (https://tools.ietf.org/html/rfc7230#section-4.1.3) I will use this approach in implementation. Thanks!

If the server sends a Trailer header and does not send the associated trailer, oh well, the app can deal with the missing value.

Agree.

geoffkizer commented 5 years ago

A few thoughts:

It seems weird that accessing TrailingHeaders before they are available (in the ResponseHeadersRead case) will just return no headers. This makes it easier for a user to be confused and assume there are no trailing headers, when really we just haven't read the whole response yet. Seems like we should throw in this case instead.

As mentioned above, we need to be super clear about when trailers are actually available. I think that trailers should not be available until the user has read the entire response body -- even if we've actually buffered the response body internally and read the trailers. Otherwise, this could lead to inconsistent behavior, where sometimes you get lucky and the trailers become available due to how buffering happened, and sometimes you don't.

Note that this only matters for ResponseHeadersRead; with ResponseContentRead (the default), trailers will always be immediately available.

Similarly, if we add IsComplete or something like that, IsComplete should only become true when the user has completely read the response body, not simply when we have buffered it internally.

That said, I am not convinced that IsComplete is really needed. I think the common pattern here will be:

(1) Issue request and await response (2) Read (non trailing) response headers as needed (3) Process entire response body, to EOF (4) Read trailing response headers as needed

So it's not clear to me what value IsComplete would add. If you rely on IsComplete, you're probably doing something wrong.

We should have an idea for how request trailers would work even if we don't implement them at the same time.

I agree, we should define the API for request headers and ensure that there is some consistency between request and response here. It's fine to not implement the request support for 3.0.

Tratcher commented 5 years ago

@geoffkizer if TrailerHeaders throws then IsCompleted becomes essential. You can't assume all of the response processing happens in the same place or that all of the context flows. Consider this method that does not have all of the context for the response consumption state.

public void LogResponse(HttpResponseMessage response)
{
  // Log status code
  // foreach header in response.Headers, log
  // foreach header in response.Content.Headers, log
  try
  {
    // foreach header in response.TrailerHeaders, log
  }
  catch (InvalidOperationException) { } // The trailers aren't available yet
}

vs:

public void LogResponse(HttpResponseMessage response)
{
  // Log status code
  // foreach header in response.Headers, log
  // foreach header in response.Content.Headers, log
  if (response.Content.IsCompleted)
  {
    // foreach header in response.TrailerHeaders, log
  }
}

If TrailerHeaders didn't throw then the worst case would be this method failing to log them. If it does throw then this method has to deal with that one way or the other. There should always be an alternative to wrapping an API in a try/catch.

geoffkizer commented 5 years ago

@Tratcher Yeah, that makes sense. I agree that we need a way to determine whether the trailers are available yet, or not, without try/catch.

That said, "IsComplete" isn't a great name for this. Perhaps "TrailingHeadersAvailable"? Or maybe change the API here from a property to bool TryGetTrailingHeaders(out HttpHeaders)?

Tratcher commented 5 years ago

"TrailersAvailable"?

TryGet is overkill for the common case where you already know they're available and use the Property. The state also only ever changes from negative to positive so atomicity isn't a concern.

caesar-chen commented 5 years ago

It seems weird that accessing TrailingHeaders before they are available (in the ResponseHeadersRead case) will just return no headers. we need a way to determine whether the trailers are available yet, or not, without try/catch.

TrailersAvailable sounds good to me as well. Added to the description for API review.

I think that trailers should not be available until the user has read the entire response body -- even if we've actually buffered the response body internally and read the trailers.

Yes, I plan to implement this way. If developers set completion option to ResponseHeadersRead, and they decided to dispose the content before fully read it, the TrailingHeaders field should just be empty.

geoffkizer commented 5 years ago

@caesar1995 You quoted my comment re throwing when trailers aren't available yet, but you didn't really respond to it -- do you think this should throw or not?

caesar-chen commented 5 years ago

I think TrailingHeaders should throw because:

Otherwise there is no way we can provide developers enough information about why there is no promised trailers.
We will add this new property field TrailersAvailable to guard/avoid exception try-catch: we can document this pattern in case developers choose to read response body themselves.

The throw will not affect the most common (which is the default ResponseContentRead, it will never throw), but it's very helpful to provide additional information in the scenarios I described above. @davidsh Do you have any push back on the throw?

davidsh commented 5 years ago

@davidsh Do you have any push back on the throw?

I don't think it should throw. It seems to be a violation of standard .NET design guidelines for good API design.

See: https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/property

AVOID throwing exceptions from property getters. Property getters should be simple operations and should not have any preconditions. If a getter can throw an exception, it should probably be redesigned to be a method.

Are there other examples of a property getter that return a collection that throw an exception?

caesar-chen commented 5 years ago

Are there other examples of a property getter that return a collection that throw an exception?

I'm not sure. I will do some research sometime later.

For now, I will cancel the review for this API tomorrow, since we haven't reached agreement yet. We can have more time discuss the design in the Wednesday meeting.

caesar-chen commented 5 years ago

The summary of today's discussion:

We will introduce only one new property TrailingHeaders.
In the case where we know in advance that there is no trailing headers, an empty static readonly HttpResponseHeaders type should be returned to reduce allocation.
The TrailingHeaders will be available only after the underneath response body stream has reached EOF.
TrailingHeaders can return null - this means trailing headers are not ready yet (stream hasn't fully read).

I will bring this for the API review next Tuesday. cc: @dotnet/ncl @Tratcher @stephentoub

stephentoub commented 5 years ago

In the case where we know in advance that there is no trailing headers

Not exactly. At some point we need to set the field that backs the TrailingHeaders property, and the idea is we set that to either a collection containing all the read headers or to a singleton read-only collection if there were none. It's not really a case of knowing in advance. In pseudo-code just mocking up the general flow, you'd have:

TrailingHeadersCollection c = null;
foreach (string line in ...)
{
    if (c == null) c = new TrailingHeadersCollection();
    c.Add(Parse(line));
}
_trailingHeaders = c ?? TrailingHeadersCollection.Empty;

JamesNK commented 5 years ago

What will code look like if the caller wants to wait until trailers are available? Read the stream until it is complete?

stephentoub commented 5 years ago

Read the stream until it is complete?

If they've used ResponseHeadersRead rather than ResponseContentRead, yes, e.g. await stream.CopyToAsync(Stream.Null).

GSPP commented 5 years ago

we should treat the response as malformed and TrailingHeaders field should just return an empty HttpResponseHeaders collection.

Is it really wise to just suppress protocol errors? The RFC allows for suppressing and for treating it as an error. Making it an error helps with debugging.

Normally, client code should know by the structure of the code whether the response has been read already or not. I don't see why someone would want to test whether we have arrived at the trailer section in the stream or not.

Accessing trailers at a point where it is logically impossible to have them (before the entire response has been read) is an API usage error and should throw. The property should not return null (or even worse an empty collection), IMO.

If you could test for trailers present what would you do if the test returned false? You have no way to mitigate that condition. You must structure the code so that trailers are available at the point of access by construction.

Normally, code would look like this:

Get response message and access headers
Read and process all content
Access trailers

Simply by following this sequence it is clear when it is permissible to access trailers.

caesar-chen commented 5 years ago

Is it really wise to just suppress protocol errors?

Forgot to update this thread, we will throw for protocol errors. For the most recent discussion on the issue, please see the in progress PR here: dotnet/corefx#35337.

Accessing trailers at a point where it is logically impossible to have them (before the entire response has been read) is an API usage error and should throw.

We don't throw for property getters, it's a violation of standard .NET design guidelines for good API design: https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/property

If you could test for trailers present what would you do if the test returned false? You have no way to mitigate that condition.

Are you asking the no trailers present case? If client doesn't expect any trailers, then it will not need to access the TrailingHeaders. If it does, the TrailingHeaders is lazy initialized when you access it, an empty collection will be returned.

Simply by following this sequence it is clear when it is permissible to access trailers.

Of course. But developer can set HttpCompletionOption.ResponseHeadersRead, and take over the control for step 2 & 3. We explore many approaches here and dotnet/corefx#35337.

Here is the new summary:

Only one new property: TrailingHeaders is needed. a. This property does NOT have a setter.
TrailingHeaders is lazy initialized, similar approach with Headers property. a. Need to check disposed to avoid connection is drained/returned to the pool before fully read the response. b. When TrailingHeaders is not ready, we will return an empty collection – not Null.
We disallow concurrent access to the HttpResponseMessage, and it’s associated objects in general. a. Document this.

GSPP commented 5 years ago

When TrailingHeaders is not ready, we will return an empty collection – not Null.

Properties should rather throw than provide incorrect values. Returning an empty collection is an incorrect value that will lead to incorrect further computation. It's a nasty pitfall for developers.

Returning a default value does not solve any problem. Rather it obscures a usage error by turning of the warning (the exception).

X AVOID throwing exceptions from property getters.

This is a sensible guideline but it's just a guideline. Many framework properties throw if they cannot do the job that is being asked (e.g. Stream.Length. Should we rather make Stream.Length return 0 or -1? Of course not. We must throw. Or, return (int?)null.). Sometimes throwing is the best overall design.

karelz commented 5 years ago

This way we do not have to expose another property TrailersAvailable as proposed earlier in https://github.com/dotnet/corefx/issues/34912#issuecomment-460509668. Using try-catch to detect if trailers are available would have perf impact and it would violate general design of exceptions to be exceptional situations.

Given that TrailingHeaders feature is corner case feature almost nobody uses anyway (except gRPC) and given that we will throw/return empty collection, only if user uses specific approach via HttpCompletionOption.ResponseHeadersRead (corner case of a corner case), we believe this design is ok. And it has a nice side-effect of following API guidelines by not throwing from property getters :)

GSPP commented 5 years ago

I understand the problems with the alternative designs. But returning a wrong result is the worst of all possible choices. In my opinion, this is an egregious design error that we should not add to the Framework (and keep it there forever for compat).

I have no skin in the game here and I certainly respect your right to make this decision. But I will make one more attempt to convince the team:

Returning an incorrect result is unacceptable.
Do we need to allow callers to detect if trailers are available? No, because the caller knows this from the structure of his own code. There is no need to test for availability.
Throwing in a property is entirely appropriate and acceptable if the property can not do it's job (see Stream.Length and many others).
If we want to allow detection of availability we can't return an empty collection anyway because the trailing headers might be empty as sent by the server. We must add a new property or return null.
I recommend returning null. It allows for detection and any usage error will be noticed quickly.
"corner case feature..." I see your point but this seems like an unforced error.
"following API guidelines" These guidelines are simply wrong and should be updated. Surely, few people believe that silently returning wrong results is an official recommendation. I will open an issue for this. On what GitHub repository shall I do that?

If this does not convince you I give up :laughing:.

davidsh commented 5 years ago

@GSPP Thank you for your feedback on the API design. Many of the issues you cite above were discussed in the API review meeting. I think those meetings are recorded and publicly available so you can view our discussions.

Throwing in a property is entirely appropriate and acceptable if the property can not do it's job (see Stream.Length and many others). I recommend returning null. It allows for detection and any usage error will be noticed quickly.

We did not want to throw an exception nor return null. The reason is that the HttpClient API family (including HttpRequestMessage and HttpResponseMessage) is agnostic with respect to the handler HTTP stack (i.e. SocketsHttpHandler for example). We allow server frameworks to consume, use and populate HttpResponseMessage objects for example. Throwing an exception or returning null from doing a get of the HttpResponseMessage.TrailingHeaders property would prevent server-side software from constructing the TrailingHeaders collection. It would also prevent mocking frameworks (i.e. using dependency injection) as well.

GSPP commented 5 years ago

@davidsh OK, that reason makes sense. I guess this overrides the concerns that I had. Thanks for responding.

JamesNK commented 5 years ago

Will HttpResponse.TrailingHeaders be in .NET Standard 2.1?

davidfowl commented 5 years ago

cc @terrajobst

karelz commented 5 years ago

It would force all independent implementers to implement it ... not triavial thing to ask for IMO. AFAIK the right process is to file an issue in dotnet/standard repo ...

Tratcher commented 5 years ago

@karelz They'd only need to provide the API which is trivial. It would still be optional for the the various handlers to implement the functionality. It wasn't implemented here for Curl or WinHttp.

karelz commented 5 years ago

Feels weird to expose something that is not supported though. We got danged for Standard APIs not working everywhere by community, it is confusing for them. I let @terrajobst guide us here - it is more about philosophy of standard, not about these specific APIs. Please create the request on standard repo. We should have the discussion there.

dotnet / runtime