envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.01k stars 4.81k forks source link

Using Envoy XDS TTL causes envoy to crash #24271

Open bladedancer opened 1 year ago

bladedancer commented 1 year ago

(Already triaged by the security and ok to open an issue).

Title: Using Envoy XDS TTL causes envoy to crash

Description: Hi, I was just playing around to get an idea of what’s possible with XDSTTL. My config, in all recent versions (1.23->1.25), crashes Envoy when a ttl is reached (apparently). Though I only have the stack for 1.25 (full log attached).

Caught Segmentation fault, suspect faulting address 0x0 Backtrace (use tools/stack_decode.py to get line numbers): Envoy version: d362e791eb9e4efa8d87f6d878740e72dc8330ac/1.18.2/clean-getenvoy-76c310e-envoy/RELEASE/BoringSSL

0: __restore_rt [0x7f1e64aea420]->[0x29700fe2d420] ??:0

1: std::1::function::__func<>::operator()() [0x55ae568e53bd]->[0x1c283bd] ??:?

2: event_process_active_single_queue [0x55ae56c085b7]->[0x1f4b5b7] snapshot.cc:?

3: event_base_loop [0x55ae56c0720e]->[0x1f4a20e] snapshot.cc:?

4: Envoy::Server::InstanceImpl::run() [0x55ae5653d7e8]->[0x18807e8] ??:?

5: Envoy::MainCommonBase::run() [0x55ae54cbe9c4]->[0x19c4] ??:0

6: Envoy::MainCommon::main() [0x55ae54cbf2e6]->[0x22e6] ??:0

7: main [0x55ae54cbd13c]->[0x13c] ??:0

8: __libc_start_main [0x7f1e64908083]->[0x29700fc4b083] ??:0

The test harness is nothing special, though I do presume there’s user error in my usage of ttl.….but it shouldn’t be crashing. I’m just using the go control plane and setting the ResourceWithTTL with static ttls for the test: https://github.com/bladedancer/xds-test/blob/ttl/pkg/xds/worker.go#L164

log.txt

kyessenov commented 1 year ago

CC @snowp @adisuissa

adisuissa commented 1 year ago

Thanks for pointing this out. We can try to reproduce this in our integration tests, but need more information. Can you check which resource type causes this (LDS, CDS, etc.)? Can you also provide some details on how often does a resource update is sent from the server (or is it sent once)? Will this occur if no requests will be sent to Envoy?

bladedancer commented 1 year ago

Can you check which resource type causes this (LDS, CDS, etc.)? Seems to only be RDS. And only when RDS cleanup is triggered first. When the order is changed to CDS, LDS, RDS then it works as advertised.

Can you also provide some details on how often does a resource update is sent from the server (or is it sent once)? Sent once. The test tool (https://github.com/bladedancer/xds-test/blob/ttl/pkg/xds/worker.go#L532) I'm using constructs an adhoc XDS payload (via some simple menu driven interactions) and sends it to envoy on demand. To reproduce this I just sent the first snapshot and then sit back and wait. It is SotW and ADS if that matters. Bootstrap config: https://github.com/bladedancer/xds-test/blob/ttl/config/envoy.yaml

Will this occur if no requests will be sent to Envoy? Yes - if I send the config and just sit for the 30s then it still crashes.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

jaysonjdmello commented 1 year ago

Hi, can you try with this patch and see if the crash is still happening

diff --git a/source/common/config/grpc_mux_impl.cc b/source/common/config/grpc_mux_impl.cc
index 6f730dc950..4014073ab8 100644
--- a/source/common/config/grpc_mux_impl.cc
+++ b/source/common/config/grpc_mux_impl.cc
@@ -440,16 +440,20 @@ void GrpcMuxImpl::expiryCallback(absl::string_view type_url,

   // Note: We can blindly dereference the lookup here since the only time we call this is in a
   // callback that is created at the same time as we insert the ApiState for this type.
-  for (auto watch : api_state_.find(type_url)->second->watches_) {
-    Protobuf::RepeatedPtrField<std::string> found_resources_for_watch;
-
-    for (const auto& resource : expired) {
-      if (all_expired.find(resource) != all_expired.end()) {
-        found_resources_for_watch.Add(std::string(resource));
+  if (api_state_.find(type_url) != api_state_.end()) {
+    for (auto watch : api_state_.find(type_url)->second->watches_) {
+      Protobuf::RepeatedPtrField<std::string> found_resources_for_watch;
+
+      for (const auto& resource : expired) {
+        if (all_expired.find(resource) != all_expired.end()) {
+          found_resources_for_watch.Add(std::string(resource));
+        }
       }
-    }

-    watch->callbacks_.onConfigUpdate({}, found_resources_for_watch, "");
+      watch->callbacks_.onConfigUpdate({}, found_resources_for_watch, "");
+    }
+  } else {
+    ENVOY_LOG(error, "{} is not found in api_state", type_url);
   }
 }