Closed sefaphlvn closed 2 weeks ago
Hey, I think you are encountering https://github.com/envoyproxy/envoy/issues/26749. This issue has been addressed in envoy v1.28 but it is conditioned by a runtime flag. The flag was just switched to on by default in v1.31. Can you activate the flag on your instances and confirm if it fixes it? We internally used it since v1.28 and it alleviated the issue. Prior to this we had to had deep-hooks in the control-plane to do so, which were not upstreamed given their brittleness.
Thank you for your response! I activated the flag as you suggested, and it resolved the issue. The Envoy instance now correctly retains the cached ClusterLoadAssignment when the initial fetch times out, so the cluster members are maintained and do not disappear.
However, I still see the initial fetch timeout warning, which seems like it will continue until a permanent solution is found for this issue.
[2024-09-12 18:30:07.187][34060614][warning][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:130] gRPC config: initial fetch timed out for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment
[2024-09-12 18:30:07.187][34060614][debug][upstream] [source/extensions/clusters/eds/eds.cc:453] Did not receive EDS response on time, using cached ClusterLoadAssignment for cluster test
[2024-09-12 18:30:07.187][34060614][debug][upstream] [source/common/upstream/upstream_impl.cc:469] transport socket match, socket default selected for host with address 192.168.1.2:12
Thanks again for pointing me in the right direction!
It appears that there’s another case. In my setup, I have two distinct listeners, each with its own HTTP Connection Manager (HCM) filter. Both HCM filters are linked to an RDS named “v28rds.”
When I attempt to update the route for one of the HCM filters by changing the RDS configuration to use a new route configuration named “route_for_ccc” the RDS definition of the relevant HCM in the config_dump reflects this change like that:
"rds": {
"config_source": {
"ads": {},
"initial_fetch_timeout": "10s",
"resource_api_version": "V3"
},
"route_config_name": "route_for_ccc"
},
However, the new route configuration does not appear in the config_dump.
This indicates that the updated route is not being applied correctly, as it doesn’t even show up in the config_dump. Could you provide insight into whether this is a known limitation or suggest any steps to ensure that Envoy properly updates and applies the new route configuration without requiring a restart?
[2024-09-13 10:43:04.594][37022954][debug][main] [source/server/server.cc:237] flushing stats
[2024-09-13 10:43:09.292][37022954][debug][http2] [source/common/http/http2/codec_impl.cc:1803] [Tags: "ConnectionId":"0"] Http2Visitor::OnFrameHeader(1, 497, 0, 0)
[2024-09-13 10:43:09.292][37022954][debug][http2] [source/common/http/http2/codec_impl.cc:1855] [Tags: "ConnectionId":"0"] Http2Visitor::OnBeginDataForStream(1, 497)
[2024-09-13 10:43:09.292][37022954][debug][http2] [source/common/http/http2/codec_impl.cc:1867] [Tags: "ConnectionId":"0"] Http2Visitor: remaining data payload: 497, end_stream: false
[2024-09-13 10:43:09.292][37022954][debug][http2] [source/common/http/http2/codec_impl.cc:1896] [Tags: "ConnectionId":"0"] Http2Visitor dispatching DATA for stream 1
[2024-09-13 10:43:09.293][37022954][debug][config] [source/extensions/config_subscription/grpc/new_grpc_mux_impl.cc:143] Received DeltaDiscoveryResponse for type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig at version 10
[2024-09-13 10:43:09.293][37022954][debug][filter] [source/common/filter/config_discovery_impl.cc:132] Updated filter config eeeeeeeoHADD-fcjyJPtD-filter accepted, posting to workers
[2024-09-13 10:43:09.293][37022954][debug][init] [source/common/init/manager_impl.cc:24] added target RdsRouteConfigSubscription RDS local-init-target route_for_ccc to init manager RDS local-init-manager route_for_ccc
[2024-09-13 10:43:09.293][37022954][debug][config] [./source/common/http/filter_chain_helper.h:111] http filter #0
[2024-09-13 10:43:09.293][37022954][debug][config] [./source/common/http/filter_chain_helper.h:173] dynamic filter name: http-filters-bgBMRm
[2024-09-13 10:43:09.293][37022954][debug][filter] [source/common/filter/config_discovery_impl.cc:146] Updated filter config eeeeeeeoHADD-fcjyJPtD-filter created, warming done
[2024-09-13 10:43:09.293][37022954][debug][config] [source/extensions/config_subscription/grpc/delta_subscription_state.cc:262] Delta config for type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig accepted with 1 resources added, 0 removed
[2024-09-13 10:43:09.300][37022954][debug][init] [source/common/init/watcher_impl.cc:31] init manager RDS local-init-manager v28rds destroyed
[2024-09-13 10:43:09.300][37022954][debug][init] [source/common/init/target_impl.cc:34] target RdsRouteConfigSubscription RDS local-init-target v28rds destroyed
[2024-09-13 10:43:09.300][37022954][debug][init] [source/common/init/watcher_impl.cc:31] RDS local-init-watcher v28rds destroyed
[2024-09-13 10:43:09.300][37022954][debug][init] [source/common/init/target_impl.cc:68] shared target RdsRouteConfigSubscription RDS init v28rds destroyed
[2024-09-13 10:43:09.300][37022954][debug][init] [source/common/init/target_impl.cc:34] target DynamicFilterConfigProviderImpl destroyed
[2024-09-13 10:43:09.300][37022954][debug][filter] [source/common/filter/config_discovery_impl.cc:181] Filter config eeeeeeeoHADD-fcjyJPtD-filter worker update complete
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.
Hey, sorry for the delayed reply.
I am not as familiar with the RDS client code in envoy, so I cannot answer for sure, but you may want to open an issue on envoy. As the code of RDS is share-nothing with the CDS code I expect the issue to be different from the one solved by the ADS cache in EDS.
You may also want to test the control-plane version of branch dd/main
in this fork, as there are multiple fixes for delta xDS done there which have not been upstreamed yet.
When I attempt to update the route for one of the HCM filters by changing the RDS configuration to use a new route configuration named “route_for_ccc” the RDS definition of the relevant HCM in the config_dump reflects this change like that:
"rds": { "config_source": { "ads": {}, "initial_fetch_timeout": "10s", "resource_api_version": "V3" }, "route_config_name": "route_for_ccc" },
Your error is for EDS response timeout:
initial fetch timed out for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment.
The configs should be changed for eds
not rds
.
When I attempt to update the route for one of the HCM filters by changing the RDS configuration to use a new route configuration named “route_for_ccc” the RDS definition of the relevant HCM in the config_dump reflects this change like that:
"rds": { "config_source": { "ads": {}, "initial_fetch_timeout": "10s", "resource_api_version": "V3" }, "route_config_name": "route_for_ccc" },
Your error is for EDS response timeout:
initial fetch timed out for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment.
The configs should be changed for
eds
notrds
.
I asked 2 separate questions, my 2nd question about rds. When the rds name is updated, I add it to the new rds snapshot, but it does not come and take it until the envoy restarts.
I am using go-control-plane v13 with Delta ADS and snapshots. The initial snapshot works correctly, and Envoy successfully fetches all configurations when it starts. However, when I update the snapshot with changes specifically in the Cluster Discovery Service (CDS), I encounter the following error in Envoy:
envoy version: 1.28.0
Additional Information:
EDS Update Success: When I update only EDS directly, Envoy successfully fetches the updates without any timeout errors.
Issue Specific to CDS Update: The timeout issue appears only when CDS is updated in the snapshot, and Envoy attempts to fetch the updated EDS configurations afterward.
Minimal Change in CDS: The CDS update involves only a minor change, specifically updating the health check parameters. The EDS configuration within the cluster remains unchanged
Please let me know if additional logs or information are needed. I am looking for guidance on whether this could be a bug in go-control-plane or if there are specific configurations or steps that I might be missing.