Open bladedancer opened 1 year ago
CC @snowp @adisuissa
Thanks for pointing this out. We can try to reproduce this in our integration tests, but need more information. Can you check which resource type causes this (LDS, CDS, etc.)? Can you also provide some details on how often does a resource update is sent from the server (or is it sent once)? Will this occur if no requests will be sent to Envoy?
Can you check which resource type causes this (LDS, CDS, etc.)? Seems to only be RDS. And only when RDS cleanup is triggered first. When the order is changed to CDS, LDS, RDS then it works as advertised.
Can you also provide some details on how often does a resource update is sent from the server (or is it sent once)? Sent once. The test tool (https://github.com/bladedancer/xds-test/blob/ttl/pkg/xds/worker.go#L532) I'm using constructs an adhoc XDS payload (via some simple menu driven interactions) and sends it to envoy on demand. To reproduce this I just sent the first snapshot and then sit back and wait. It is SotW and ADS if that matters. Bootstrap config: https://github.com/bladedancer/xds-test/blob/ttl/config/envoy.yaml
Will this occur if no requests will be sent to Envoy? Yes - if I send the config and just sit for the 30s then it still crashes.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
Hi, can you try with this patch and see if the crash is still happening
diff --git a/source/common/config/grpc_mux_impl.cc b/source/common/config/grpc_mux_impl.cc
index 6f730dc950..4014073ab8 100644
--- a/source/common/config/grpc_mux_impl.cc
+++ b/source/common/config/grpc_mux_impl.cc
@@ -440,16 +440,20 @@ void GrpcMuxImpl::expiryCallback(absl::string_view type_url,
// Note: We can blindly dereference the lookup here since the only time we call this is in a
// callback that is created at the same time as we insert the ApiState for this type.
- for (auto watch : api_state_.find(type_url)->second->watches_) {
- Protobuf::RepeatedPtrField<std::string> found_resources_for_watch;
-
- for (const auto& resource : expired) {
- if (all_expired.find(resource) != all_expired.end()) {
- found_resources_for_watch.Add(std::string(resource));
+ if (api_state_.find(type_url) != api_state_.end()) {
+ for (auto watch : api_state_.find(type_url)->second->watches_) {
+ Protobuf::RepeatedPtrField<std::string> found_resources_for_watch;
+
+ for (const auto& resource : expired) {
+ if (all_expired.find(resource) != all_expired.end()) {
+ found_resources_for_watch.Add(std::string(resource));
+ }
}
- }
- watch->callbacks_.onConfigUpdate({}, found_resources_for_watch, "");
+ watch->callbacks_.onConfigUpdate({}, found_resources_for_watch, "");
+ }
+ } else {
+ ENVOY_LOG(error, "{} is not found in api_state", type_url);
}
}
(Already triaged by the security and ok to open an issue).
Title: Using Envoy XDS TTL causes envoy to crash
Description: Hi, I was just playing around to get an idea of what’s possible with XDSTTL. My config, in all recent versions (1.23->1.25), crashes Envoy when a ttl is reached (apparently). Though I only have the stack for 1.25 (full log attached).
Caught Segmentation fault, suspect faulting address 0x0 Backtrace (use tools/stack_decode.py to get line numbers): Envoy version: d362e791eb9e4efa8d87f6d878740e72dc8330ac/1.18.2/clean-getenvoy-76c310e-envoy/RELEASE/BoringSSL
0: __restore_rt [0x7f1e64aea420]->[0x29700fe2d420] ??:0
1: std::1::function::__func<>::operator()() [0x55ae568e53bd]->[0x1c283bd] ??:?
2: event_process_active_single_queue [0x55ae56c085b7]->[0x1f4b5b7] snapshot.cc:?
3: event_base_loop [0x55ae56c0720e]->[0x1f4a20e] snapshot.cc:?
4: Envoy::Server::InstanceImpl::run() [0x55ae5653d7e8]->[0x18807e8] ??:?
5: Envoy::MainCommonBase::run() [0x55ae54cbe9c4]->[0x19c4] ??:0
6: Envoy::MainCommon::main() [0x55ae54cbf2e6]->[0x22e6] ??:0
7: main [0x55ae54cbd13c]->[0x13c] ??:0
8: __libc_start_main [0x7f1e64908083]->[0x29700fc4b083] ??:0
The test harness is nothing special, though I do presume there’s user error in my usage of ttl.….but it shouldn’t be crashing. I’m just using the go control plane and setting the ResourceWithTTL with static ttls for the test: https://github.com/bladedancer/xds-test/blob/ttl/pkg/xds/worker.go#L164
log.txt