Closed qfiard closed 4 months ago
I have a tentative fix in https://github.com/envoyproxy/envoy/pull/33228 to fix the handling of empty trailers to terminate the stream instead of sending an empty HEADERS frame.
An alternative would be to skip the call to encodeTrailers when headers are empty. That would require broader changes to Envoy and may break the design of filters if we expect empty trailers processed by a first filter to be handled by a subsequent ones. This change is more narrowly scoped but it would be good for a maintainer to confirm that this is the right fix.
cc @RyanTheOptimist @danzh2010
Hi Quentin, thanks for reporting the issue and proposing the fix! Could you elaborate more about why an empty trailer in response could cause the stream hanging but an empty body with FIN over HTTP/3 won't? Supposedly there is some gRPC contract that HTTP/3 is violating and causing the hang? How does HTTP/2 codec handle this situation?
Interesting. It looks like the HTTP/2 codec also skips the encoding of empty trailers, and instead sends empty data. How odd. Well, if this is the Envoy way of doing this, then it probably makes sense for both the H2 and H3 codecs to behave similarly. But we'd likely need a runtime flag to guard this new (hopefully correct) behavior.
https://github.com/envoyproxy/envoy/blob/main/source/common/http/http2/codec_impl.cc#L630-L640
@danzh2010 raises a good question that I don't think I understand the answer to. It sounds like if the client talks directly to a gRPC-web server, then the empty trailers do not cause a hang. But when the client talks to a Envoy and Envoy talks to the server, then the request hangs. Do we understand what the difference here is? To confirm, is the client sending empty trailers? Or is the gRPC-Web filter creating them even when the client does not?
Thank you both for looking into this.
It sounds like if the client talks directly to a gRPC-web server, then the empty trailers do not cause a hang.
From https://github.com/grpc/grpc-web, gRPC Web isn't meant to be directly exposed by a web server, so you won't usually find a gRPC-web server that a client can talk to directly. Instead, gRPC Web clients connect to a regular gRPC server with a proxy doing the translation, and Envoy is the default implementation of this.
gRPC-web clients connect to gRPC services via a special proxy; by default,
gRPC-web uses [Envoy](https://www.envoyproxy.io/).
gRPC uses trailers to communicate the status of the RPC, but gRPC Web doesn't as trailers aren't exposed by web browsers (https://stackoverflow.com/questions/46373013/access-to-http-2-trailers-in-a-browser). So Envoy and the grpc_web filter is responsible for translating the trailers into a body chunk.
When using QUIC, this is happening correctly but Envoy appends an empty HEADERS frame which seems to block Chrome from considering the stream complete. Do you know if the later is expected? It is possible that this is a Chrome issue if a HEADERS frame at the end of a stream is expected to be handled gracefully (from the screenshot above, the HEADERS frame does have FIN=true). But in the meantime removing the HEADERS frame when empty should fix the issue.
Should I go ahead and add the runtime flag to my pull request, or do you prefer to take this over?
Envoy and the grpc_web filter is responsible for translating the trailers into a body chunk.
When using QUIC, this is happening correctly but Envoy appends an empty HEADERS frame which seems to block Chrome from considering the stream complete. Do you know if the later is expected?
Did grpc_web filter fail to translate an empty trailer but passed it down somehow? Or did it translate a non-empty trailer into body and appended an empty one?
Thank you both for looking into this.
It sounds like if the client talks directly to a gRPC-web server, then the empty trailers do not cause a hang.
From https://github.com/grpc/grpc-web, gRPC Web isn't meant to be directly exposed by a web server, so you won't usually find a gRPC-web server that a client can talk to directly. Instead, gRPC Web clients connect to a regular gRPC server with a proxy doing the translation, and Envoy is the default implementation of this.
gRPC-web clients connect to gRPC services via a special proxy; by default, gRPC-web uses [Envoy](https://www.envoyproxy.io/).
gRPC uses trailers to communicate the status of the RPC, but gRPC Web doesn't as trailers aren't exposed by web browsers (https://stackoverflow.com/questions/46373013/access-to-http-2-trailers-in-a-browser). So Envoy and the grpc_web filter is responsible for translating the trailers into a body chunk.
When using QUIC, this is happening correctly but Envoy appends an empty HEADERS frame which seems to block Chrome from considering the stream complete. Do you know if the later is expected? It is possible that this is a Chrome issue if a HEADERS frame at the end of a stream is expected to be handled gracefully (from the screenshot above, the HEADERS frame does have FIN=true). But in the meantime removing the HEADERS frame when empty should fix the issue.
Yes, this does sound a like a Chrome bug to me, yes. At the QUIC layer, the stream is closed because a QUIC STREAM frame was received with the fin bit set. There's nothing "wrong" at the HTTP/3 layer, as far as I know, with an empty HEADERS frame. But it makes me wonder if Chrome is somehow treating empty headers as incomplete headers? Have you worked with Chrome team on this at all?
Should I go ahead and add the runtime flag to my pull request, or do you prefer to take this over?
If you could add a flag, that'd be great. I think I'd like to understand the problem a bit more clearly before we land anything, but adding a flag is a good idea.
Thanks for the offline discussion this morning. The server sends non-empty trailers to Envoy, and then the gRPC filters mutate the trailers (by removing the status field) which leaves the trailers empty at the end of the filter chain. It's a bit odd that Envoy chooses to call EncodeTrailers at that point, but it's not crazy :) So let's go ahead and move forward with your PR, subject to runtime-flag protection.
In addition, the Chrome behavior of having a request hang when empty HTTP/3 trailers are received is clearly a bug. I filed https://issues.chromium.org/issues/332587381, and fixed the issue with this commit https://chromium-review.googlesource.com/c/chromium/src/+/5421968 Chrome Canary should pick this up in the next day or so if you would like to confirm.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.
Title: gRPC-Web requests hang over QUIC
Description: gRPC-Web requests hang over QUIC:
It appears to be due to an empty HEADERS frame being sent by Envoy at the end of the stream instead terminating it:
The bug appears to be that trailers are encoded into a QUIC frame even when they are empty, which is the case with the gRPC-Web filters which clears and converts trailers into a body chunk.
Repro steps: