envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.74k stars 4.75k forks source link

net: downstream_local_disconnect documentation is vague on cause #24283

Closed jmarantz closed 1 year ago

jmarantz commented 1 year ago

Description:

We have seen customers that have long-running http requests close unexpectedly, with a RST sent to the upstream. This is often associated with the error-code "downstream_local_disconnect" in the access log. The doc for this error is "“The client connection was locally closed for an unspecified reason.”

That description is not very actionable to end-users. This bug tracks efforts to add more logging detail around this error. I think this involves finding the code-paths where Network::ConnectionEvent::LocalClose is populated, and walking back the call-stack to understand why. A superficial glance tells us things like HttpUpstream::onResetStream was called or doneReading() was called, or HttpUpstream was destructed with resetencoder non-null. However we probably need another few levels of causality there to relate it to something understandable from outside the details of Envoy.

Per suggestion from @alyssawilk I added

ASSERT(event == Network::ConnectionEvent::RemoteClose, absl::StrCat("event=", event));

to ConnectionManagerImpl::onEvent and ran tests. Lots of them fail; I think most or all were integration tests. This needs deeper investigation.

@mattklein123 @KBaichoo FYI

jmarantz commented 1 year ago

@krajshiva FYI

KBaichoo commented 1 year ago

3 points: 1) Do we want to change the data model of the response code details? For example in network connection failure we actually append. See: https://github.com/envoyproxy/envoy/blob/058d2089efa94100beab8fb7b77da8d561f3bf65/source/common/network/connection_impl.cc#L560 2) Looking all tests I see the following clobbering occurring in tests. The via_upstream, etc. might be originally set when we first got response code headers. This isn't a guarantee that this is the response code that the client will receive though.

bazel-testlogs/test/extensions/filters/http/compressor/compressor_integration_tests/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/extensions/filters/http/dynamic_forward_proxy/proxy_filter_integration_test/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/extensions/filters/http/ext_proc/ext_proc_integration_test/shard_1_of_2/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/extensions/filters/http/ext_proc/ext_proc_integration_test/shard_2_of_2/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/extensions/filters/http/tap/tap_filter_integration_test/test.log:kbaichoo: clobbering :admin_filter_responsewith downstream_remote_disconnect
bazel-testlogs/test/extensions/stats_sinks/hystrix/hystrix_integration_test/test.log:kbaichoo: clobbering :admin_filter_responsewith downstream_remote_disconnect
bazel-testlogs/test/extensions/transport_sockets/tls/integration/ssl_integration_test/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/http2_flood_integration_test/shard_1_of_4/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/http2_flood_integration_test/shard_2_of_4/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/http2_flood_integration_test/shard_3_of_4/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/http2_flood_integration_test/shard_4_of_4/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/integration_test/shard_1_of_2/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/integration_test/shard_2_of_2/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/multiplexed_integration_test/shard_1_of_8/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/multiplexed_integration_test/shard_2_of_8/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/multiplexed_integration_test/shard_3_of_8/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/multiplexed_integration_test/shard_4_of_8/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/multiplexed_integration_test/shard_5_of_8/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/multiplexed_integration_test/shard_6_of_8/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/multiplexed_integration_test/shard_7_of_8/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/multiplexed_integration_test/shard_8_of_8/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/multiplexed_upstream_integration_test/shard_1_of_4/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/multiplexed_upstream_integration_test/shard_2_of_4/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/multiplexed_upstream_integration_test/shard_3_of_4/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/multiplexed_upstream_integration_test/shard_4_of_4/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/quic_protocol_integration_test/shard_6_of_12/test.log:kbaichoo: clobbering :via_upstreamwith downstream_local_disconnect
bazel-testlogs/test/integration/quic_protocol_integration_test/shard_7_of_12/test.log:kbaichoo: clobbering :via_upstreamwith downstream_local_disconnect
bazel-testlogs/test/integration/quic_protocol_integration_test/shard_8_of_12/test.log:kbaichoo: clobbering :via_upstreamwith downstream_local_disconnect
bazel-testlogs/test/integration/tcp_tunneling_integration_test/shard_1_of_7/test.log:kbaichoo: clobbering :via_upstreamwith downstream_local_disconnect
bazel-testlogs/test/integration/tcp_tunneling_integration_test/shard_1_of_7/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/tcp_tunneling_integration_test/shard_2_of_7/test.log:kbaichoo: clobbering :via_upstreamwith downstream_local_disconnect
bazel-testlogs/test/integration/tcp_tunneling_integration_test/shard_2_of_7/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/tcp_tunneling_integration_test/shard_3_of_7/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/tcp_tunneling_integration_test/shard_4_of_7/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/tcp_tunneling_integration_test/shard_5_of_7/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/tcp_tunneling_integration_test/shard_6_of_7/test.log:kbaichoo: clobbering :via_upstreamwith downstream_local_disconnect
bazel-testlogs/test/integration/tcp_tunneling_integration_test/shard_6_of_7/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/tcp_tunneling_integration_test/shard_7_of_7/test.log:kbaichoo: clobbering :via_upstreamwith downstream_local_disconnect
bazel-testlogs/test/integration/tcp_tunneling_integration_test/shard_7_of_7/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/uds_integration_test/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/websocket_integration_test/shard_1_of_4/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/websocket_integration_test/shard_2_of_4/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/websocket_integration_test/shard_3_of_4/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect
bazel-testlogs/test/integration/websocket_integration_test/shard_4_of_4/test.log:kbaichoo: clobbering :via_upstreamwith downstream_remote_disconnect

There were some junk unit test results that clobbering is was omitted e.g.:

test/common/http/conn_manager_impl_test/shard_1_of_3/test.log:kbaichoo: clobbering :barwith downstream_remote_disconnect

3) I need to dig into this some more, but I think one way we could provide this information when we local close is having the HCM query the network connection for a local close reason, or similar. We could then get rid of "The client connection was locally closed for an unspecified reason."

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.