envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.1k stars 4.82k forks source link

RESPONSE_CODE_DETAILS in access log does not show the reason for Upstream Protocol Error #26691

Open robertpanzer opened 1 year ago

robertpanzer commented 1 year ago

Title: RESPONSE_CODE_DETAILS does not show the reason for Upstream Protocol Error

Description: When I proxy a HTTP request through Envoy, and the upstream returns too many headers, the %RESPONSE_CODE_DETAILS% in the access log shows upstream_reset_before_response_started{protocol_error} which does not give any more information than the flag UPE. According to the Envoy docs I would expect something that contains the string http1.too_many_headers.

Repro steps: I created a reproducer at https://github.com/robertpanzer/envoy-access-log In short, when a response is rejected by Envoy due to too many headers, or too many header bytes, the access log shows RESPONSE_CODE_DETAILS like this:

[2023-04-12T07:20:50.890Z] "GET /toomanyheaders HTTP/1.1" 502 UPE upstream_reset_before_response_started{protocol_error} - "-" 0 87 92 - "-" "curl/7.87.0" "10316164-77c6-4ea4-a7a1-4371dbb1aa24" "localhost:8887" "-" "-" "-" "- " "-" "192.168.0.225:8888" inbound 172.17.0.3:40112 172.17.0.3:8887 172.17.0.1:37802 - -
[2023-04-12T07:21:07.746Z] "GET /toomanyheaderbytes HTTP/1.1" 502 UPE upstream_reset_before_response_started{protocol_error} - "-" 0 87 721 - "-" "curl/7.87.0" "b51c8970-30c8-42bc-aa3b-26de4daeec4b" "localhost:8887" "-" "-" "-" "- " "-" "192.168.0.225:8888" inbound 172.17.0.3:48298 172.17.0.3:8887 172.17.0.1:48050 - -

This log makes it harder to spot the reason for why the response was rejected.

Tested on the latest Envoy 1.25.4, as well as Istio Proxy 1.10 and 1.15.

Config: Config is available in the reproducer repo at https://github.com/robertpanzer/envoy-access-log

Logs:

[2023-04-12T07:20:50.890Z] "GET /toomanyheaders HTTP/1.1" 502 UPE upstream_reset_before_response_started{protocol_error} - "-" 0 87 92 - "-" "curl/7.87.0" "10316164-77c6-4ea4-a7a1-4371dbb1aa24" "localhost:8887" "-" "-" "-" "- " "-" "192.168.0.225:8888" inbound 172.17.0.3:40112 172.17.0.3:8887 172.17.0.1:37802 - -
[2023-04-12T07:21:07.746Z] "GET /toomanyheaderbytes HTTP/1.1" 502 UPE upstream_reset_before_response_started{protocol_error} - "-" 0 87 721 - "-" "curl/7.87.0" "b51c8970-30c8-42bc-aa3b-26de4daeec4b" "localhost:8887" "-" "-" "-" "- " "-" "192.168.0.225:8888" inbound 172.17.0.3:48298 172.17.0.3:8887 172.17.0.1:48050 - -
mattklein123 commented 1 year ago

cc @ohadvano @kyessenov who have been in the weeds on this recently.

ohadvano commented 1 year ago

@robertpanzer is there a default maximum headers count or did you configure this somehow? sorry that I am not familiar with how the limit is enforced by Envoy.

robertpanzer commented 1 year ago

The default maximum header count should be 100 as documented at https://www.envoyproxy.io/docs/envoy/v1.25.5/api-v3/config/core/v3/protocol.proto.html#config-core-v3-httpprotocoloptions. The actual check for this limit happens here for http1 if I am not wrong: https://github.com/envoyproxy/envoy/blob/main/source/common/http/http1/codec_impl.cc#L565

That's not the point of this issue though, the issue is that the access log does not show that too many headers are responsible for the Upstream Protocol Error.

kyessenov commented 1 year ago

FWIW the error code detail http1.too_many_headers is about the request headers on the downstream. What looks like is happening is that it gets wrapped into "upstream protocol error" so this detail is lost. I think we should be able to unwrap it as "protocol_error{http1.too_many_headers}" since the cardinality is bounded.

kyessenov commented 1 year ago

The interesting issue here is that protocol errors corrupt the entire upstream HTTP connection. I'm not sure if it's OK to propagate a failure reason from one request to another in the multiplexed connections. The general flow is that any request can corrupt the upstream HTTP connection, and they all get notified about UPE, since we can't easily attribute the failure to a request in every protocol error.

zhangbo1882 commented 1 year ago

I met similar issue that the access log only show "protocol error". No details about why the protocol is in incorrect. Does envoy access log support that or it is just a bug ?

kyessenov commented 1 year ago

The error detail for the codec protocol error is not plumbed through - I'm looking into fixing it.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

p4ali commented 1 year ago

Another usecase:

When upstream response 204 with non-empty body (which is invalid per RFC7230), envoy shows the similar vague log with message of upstream connect error or disconnect/reset before headers. reset reason: protocol error

"POST /api/someapi HTTP/1.1" 502 UPE "-" "-" 75 87 1726 - "11.16.170.0" "curl/7.87.0" "24597139-ddb1-4826-b601-a5043ae27b54" "api.example.com" "-" "-" "-" "-" "-" "127.0.0.1:8080" inbound|8080|| 127.0.0.1:47392 11.16.120.36:8080 11.16.170.0:0 - default

If turn on the debug flag, the log actually shows:

400 2023-05-17T22:00:30.410106Z     debug   envoy client    [C574] Error dispatching received data: http/1.1 protocol error: content length not allowed in 1xx or 204
401 2023-05-17T22:00:30.410176Z     debug   envoy connection        [C574] closing data_to_write=0 type=1
402 2023-05-17T22:00:30.410188Z     debug   envoy connection        [C574] closing socket: 1
403 2023-05-17T22:00:30.410273Z     debug   envoy client    [C574] disconnect. resetting 1 pending requests
404 2023-05-17T22:00:30.410294Z     debug   envoy client    [C574] request reset
405 2023-05-17T22:00:30.410327Z     debug   envoy router    [C34][S14130670338815548715] upstream reset: reset reason: protocol error, transport failure reason:
406 2023-05-17T22:00:30.411524Z     debug   envoy http      [C34][S14130670338815548715] Sending local reply with details upstream_reset_before_response_started{protocol_error}

I wish envoy could return the more accurate reason content length not allowed in 1xx or 204 instead of reset reason: protocol error.