Closed ivitjuk closed 5 years ago
@ivitjuk this sounds like a useful addition for the ACK process if you want to make a proposal. cc @htuch
A couple of things:
If you sequence as per https://github.com/envoyproxy/data-plane-api/blob/master/XDS_PROTOCOL.md#eventual-consistency-considerations, as [CDS, EDS, RDS], if everything ACKs, the clusters should be warm by time the RDS update arrives. This really requires ADS or cross-management server coordination in practice.
We could make an extension to allow for resource status to be queried. I think we would want to build this into the incremental xDS variant, as it has better support for fine grained resource addressing; a limitation of the current state-of-the-world updates is that status such as ACK is on an entire update granularity, not a specific resource.
@htuch in the point 1, when you say "...the clusters should be warm by time the RDS update arrives", is this because there is explicit control in Envoy that ensures this, or do we just opportunistically rely on timing to work for us?
The reason I'm asking is that we do have ADS which tries to send resources in order, and from what I see in the Envoy logs, order was correct. Nevertheless, it doesn't seem like Envoy ensures clusters are ready by the time it starts applying routes that depend on them.
Take a look at the following excerpt from an Envoy log. It's a single update cycle triggered by a route addition. Added route depends on a new cluster, so CDS update was sent too. Order in the log seems to be correct (CDS, LDS and finally RDS):
$ grep -E "starting warming|warming cluster.*complete|:status.*503|unknown cluster|Received gRPC message for type" /tmp/bad_envoy_log.txt | grep -A1 "2018-11-30 17:58:20" | cat -n
1 [2018-11-30 17:58:20.333][22][debug][upstream] source/common/config/grpc_mux_impl.cc:168] Received gRPC message for type.googleapis.com/envoy.api.v2.Cluster at version 280
2 [2018-11-30 17:58:20.337][22][info][upstream] source/common/upstream/cluster_manager_impl.cc:500] add/update cluster cds|egress|test_mesh|front_node|mid_node_b_2|http|8022|9 starting warming
3 [2018-11-30 17:58:20.338][22][debug][upstream] source/common/config/grpc_mux_impl.cc:168] Received gRPC message for type.googleapis.com/envoy.api.v2.Listener at version 280
4 [2018-11-30 17:58:20.350][22][debug][upstream] source/common/config/grpc_mux_impl.cc:168] Received gRPC message for type.googleapis.com/envoy.api.v2.RouteConfiguration at version 280
5 [2018-11-30 17:58:20.401][66][debug][router] source/common/router/router.cc:241] [C26410][S12949414697162957557] unknown cluster 'cds|egress|test_mesh|front_node|mid_node_b_2|http|8022|9'
6 ':status', '503'
7 [2018-11-30 17:58:20.494][22][info][upstream] source/common/upstream/cluster_manager_impl.cc:512] warming cluster cds|egress|test_mesh|front_node|mid_node_b_2|http|8022|9 complete
8 [2018-11-30 17:58:27.360][22][debug][upstream] source/common/config/grpc_mux_impl.cc:168] Received gRPC message for type.googleapis.com/envoy.api.v2.Cluster at version 281
@ivitjuk you show in your log [CDS, LDS, RDS], where is EDS? you will need an EDS update for the warming to complete? I think you need to have [CDS, EDS, RDS] to make this work.
This specific setup doesn't use EDS, it uses LOGICAL_DNS
. Following is the CDS config corresponding to the logs I previously pasted:
name: "cds|egress|test_mesh|front_node|mid_node_b_2|http|8022|9"
type: LOGICAL_DNS
connect_timeout {
seconds: 1
}
http2_protocol_options {
}
load_assignment {
cluster_name: "cds|egress|test_mesh|front_node|mid_node_b_2|http|8022|9"
endpoints {
lb_endpoints {
endpoint {
address {
socket_address {
address: "mid_b_2.test_mesh.local"
port_value: 8022
}
}
}
}
}
}
@ivitjuk yeah, logical DNS is going to take arbitrary time to warm, so sequencing alone won't cut it.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.
@ivitjuk do you want to try extend the incremental xDS design to accommodate this? See https://github.com/envoyproxy/envoy/blob/master/api/XDS_PROTOCOL.md#incremental-xds and https://github.com/envoyproxy/envoy/blob/7c5badba34e8e4e5115b81776bafc7bcaa3a5834/api/envoy/api/v2/discovery.proto#L99.
htuch@ we need to fix the 503 issue we are seeing and we will need a deeper dive into the incremental xDS to understand whether it's a feasible approach for us or not. Would appreciate if we could keep this issue open until we get this properly prioritized. I created a bug fix on our side in https://github.com/awslabs/aws-app-mesh-examples/issues/64
@ivitjuk Ack. Incremental xDS is going to play out over the next quarter or so, I think you'll need an interim workaround. In general, we don't have a good story for better-than-eventual consistency if you use logical DNS rather than EDS as things stand today.
As a super hack, you could poll the admin endpoint and look at /clusters
potentially to see what has been resolved, but I don't recommend this as a production solution, the admin endpoint is not designed to play a role as a sequencing step on the control plane.
We can keep the issue open, the question really is what do we want to do short term vs. building into incremental xDS? Maybe we should have some service for this in any case? Our general approach so far, e.g. with ADS, has been to try and find low cost solutions for better-than-eventual consistency, since Envoy is mostly geared towards eventual consistency use cases today.
@htuch Before listing the ideas that came to my mind, let me first lay out the problem we are trying to address in a single sentence:
When it receives a new route, Envoy should only start matching the route once the cluster route depends on becomes ready.
Solutions that I had in mind are:
Option 1. sounds like the simplest approach to me. I can also see there is a similar validate_clusters option in the route config. However, it is only concerned with the non-existent clusters. Could we extend this to non-ready clusters too? I think this new rule should only be applied when a route is initially added. 503 should still be returned later if the cluster transitions from read to non-ready state.
Regarding your suggestion to extend the incremental xDS design, I believe that would require incorporating item 2. into the designed in some way.
1) is "easy" if you are OK with the RDS update failing (with a hopefully useful error message). Meaning, we can definitely check if a cluster is both ready and warm I think pretty easily, and fail if not, with a new validate type. Is this acceptable to you?
2) I think this also does have value, and would not require incremental xDS. I could see general value in this, for listeners also.
3) Agree this is possible, but it seems overkill to me and I think either 1 or 2 would work (or both).
@htuch thoughts?
I think (1) is the best option here, if @ivitjuk is good with the existing definition of warm; specifically it doesn't say that we have a valid set of hosts loaded by EDS, simply that one EDS update has occurred. We might have a warm cluster with an empty set of hosts returned by the server, or a set of hosts that are all unhealthy. Does this work?
@mattklein123 in the option 1. instead of returning error message, I was actually thinking about accepting RDS and delay matching until cluster warms up. That makes more sense to me because the fact cluster is in a warming up state makes it's status uncertain (neither healthy not unhealthy). So we wait for resolution of that uncertainty before start matching the dependent RDS.
Regarding the option 1. vs 2., would it make sense to implement 1. first and then work 2.? I think 1. would unblock us faster since it would not require any change on the ADS side. Then we could work on the option 2.
@htuch That definition of warm is perfectly acceptable. In addition to EDS
, would the same also apply to LOGICAL_DNS
? LOGICAL_DNS
led to this behavior in the first place.
@mattklein123 in the option 1. instead of returning error message, I was actually thinking about accepting RDS and delay matching until cluster warms up. That makes more sense to me because the fact cluster is in a warming up state makes it's status uncertain (neither healthy not unhealthy). So we wait for resolution of that uncertainty before start matching the dependent RDS.
I think when you go to implement this you will realize this is going to be quite complicated, and I'm not sure the complexity is worth it. If this is the behavior you want, personally I would recommend doing (2). Granted, I haven't thought about (1) very much so it's possible I'm missing something, so feel free to sketch out an implementation if you think it's not that hard.
@ivitjuk yes, it applies to logical DNS as well, see https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/cluster_manager.html?highlight=warming#cluster-warming for the definition of cluster warming.
@mattklein123 I had a look at the route management code and I see what you meant when you said option (1) with accepting and holding RDS until CDS becomes warm would not be simple. It would involve either weaving ConnectionManager
instance through the RouteMatcher
, VirtualHost*
and RouteEntry*
so they can check CDS warmth status, or changing the routing architecture to accept CDS warmth notifications from the ConnectionManager and update it's internal routing data structures. None of them seems to be simple.
So your suggestion for option (1) to return error code on RDS update when CDS is not ready makes sense.
Major difference between (1) and (2) is then the way ADS is implemented. Option (1) would essentially be a polling mechanism where ADS would retry RDS update until success or some retry limit is reached. Option (2) is a notification mechanism where Envoy would send ACK when CDS is warm, after which RDS update would follow.
I prefer the notification mechanism, hence option (2), but would have to evaluate it's implementation complexity before we make a final decision between (1) and (2).
I think you should just do (2). You don't really need to change xDS wire protocol to do this, it's sufficiently underspecified, the code is essentially on the CDS side. Since we already have warming support for CDS and LDS, just deferring the Ack seems not too hard to add.
We has experienced the same issue. We has configured a custom health checker which the first round health check complete time may various depends on the endpoints location to the envoy host from 10 ms to 30 ms. For now (2) seems to be promising and easy to implement for our customized envoy. By the way, i'd like to ask will this be designed as the fundamental xds protocol implied term or be a optional configurable feature in cluster configuration in the future?
We decided to go with option (2) and will try to pull it to the next sprint.
@tony-shi It seems to me that we don't need a configuration option to enable this feature, as it should not break or significantly change behavior of existing xDS server implementations. @htuch do you agree?
@ivitjuk yep; this should be transparent, since Envoy always had the freedom to delay an ACK an arbitrary amount of time, correctly implemented xDS management servers should not be sensitive to this and we're strengthening rather than weakening consistency.
There's also work in the MCP protocol, a sister protocol of xDS, that has first class resource status reporting support. I actually thing this would be a more systematic approach, CC @costinm @rshriram @ozevren (not sure what Jason's GH ID is..).
Do you guys mind to attach a MCP doc link in this issue? Since option (2) seems diverging from a status reporting support?
@ayj (Jason)
Option 2 seems reasonable short term, we'll need to make small adjustments in Istio, right now we don't wait for the ACK, just use it for metrics. Any estimate on schedule ? I've been working on this but not expecting to get it in Istio 1.1. It would be great to have a flag or way to control it, even if it seems safe.
See the MCP Enhancement proposal for an overview. You'll need to join the istio-team-drive-access@ mailing list to access.
fwiw, there is already a validate_clusters option in the route configuration. So, you have option 1 implemented in Envoy (if I read this long thread properly). Option2 to me has some drawbacks. EDS update received or DNS resolved does not imply they were successful or that one or more hosts were obtained.
I was able to start working on the option (2) yesterday. So far I have a sketch implementation for which I would like to get early feedback before I continue pursuing this idea further.
I put the implementation and testing details in the commit message. Let me know if that makes sense,
@htuch can you check the POC I posted in the previous message? My concern with that approach is that pausing the whole CDS might have negative impact on the resource update latency. If that's the case, it seems to me that api_state_
in GrpcMuxImpl
would need to become aware of the in-flight updates so we don't process them in queueDiscoveryRequest()
.
@ivitjuk I had a look, it seems reasonable to me what you are doing. I'm not sure if modifying GrpcMuxImpl
would change latency, but maybe I'm missing something. I'd be interested in seeing tests to validate things like having two CDS updates back-to-back with only partial warming completed after the first CDS update. So far, we have only used pause
/resume
synchronously, so using them async is going to make reasoning here a bit trickier.
@ivitjuk also, you might be interested in tracking where https://github.com/envoyproxy/envoy/issues/5168#issuecomment-457985329 is heading. It's not the same as this issue, but there is common ground on cluster warming and possible changes to GrpcMuxImpl
. CC @ramaraochavali
Published a PR: #5887
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.
I'm going to close this out based on #5887 merging, there is the more general problem but let's reopen when someone wants to drive this work.
We came across this problem with Ambassador. We're doing blue/green deploys by creating a new namespace for each deploy, which introduces a new Cluster
each time.
Ambassador's Ambex ADS service feeds the clusters, then listeners. When Envoy receives the listeners, the corresponding cluster isn't warm yet, causing 503s from attempts to route requests to the unknown cluster.
For now, we're using weights to work around this. We create the Service
s in the new namespace with a weight of 0, wait an arbitrary amount of time, then set the weight to 100. I'll give endpoint routing a shot if I have a chance.
Jul 1 14:09:06 ambassador-pqz4d ambassador debug [config] [source/common/config/grpc_mux_impl.cc:120] Received gRPC message for type.googleapis.com/envoy.api.v2.Cluster at version v2490
Jul 1 14:09:06 ambassador-pqz4d ambassador debug [config] [source/common/config/grpc_mux_impl.cc:97] Pausing discovery requests for type.googleapis.com/envoy.api.v2.ClusterLoadAssignment
Jul 1 14:09:06 ambassador-pqz4d ambassador 2019/07/01 18:09:06 notify: sh /ambassador/post_watt.sh http://localhost:8002/snapshots/2491
Jul 1 14:09:06 ambassador-pqz4d ambassador debug [config] [source/common/config/grpc_mux_impl.cc:97] Pausing discovery requests for type.googleapis.com/envoy.api.v2.Cluster
Jul 1 14:09:06 ambassador-pqz4d ambassador debug [config] [source/common/config/grpc_mux_subscription_impl.cc:61] gRPC config for type.googleapis.com/envoy.api.v2.Cluster accepted with 8 resources with version v2490
Jul 1 14:09:06 ambassador-pqz4d ambassador debug [config] [source/common/config/grpc_mux_impl.cc:120] Received gRPC message for type.googleapis.com/envoy.api.v2.Listener at version v2490
Jul 1 14:09:06 ambassador-pqz4d ambassador debug [config] [source/common/config/grpc_mux_impl.cc:97] Pausing discovery requests for type.googleapis.com/envoy.api.v2.RouteConfiguration
Jul 1 14:09:06 ambassador-pqz4d ambassador warning [config] [source/common/config/grpc_mux_subscription_impl.cc:72] gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) ambassador-listener-8443: route: unknown weighted cluster 'cluster_https___graphql_orinoco_05cd6e61-0'
Jul 1 14:09:06 ambassador-pqz4d ambassador debug [config] [source/common/config/grpc_mux_impl.cc:120] Received gRPC message for type.googleapis.com/envoy.api.v2.Listener at version v2490
Jul 1 14:09:06 ambassador-pqz4d ambassador debug [config] [source/common/config/grpc_mux_impl.cc:97] Pausing discovery requests for type.googleapis.com/envoy.api.v2.RouteConfiguration
Jul 1 14:09:06 ambassador-pqz4d ambassador warning [config] [source/common/config/grpc_mux_subscription_impl.cc:72] gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) ambassador-listener-8443: route: unknown weighted cluster 'cluster_https___graphql_orinoco_05cd6e61-0'
Title: Support for resource readiness feedback in XDS.
Description: We are experiencing temporary 503 responses from Envoy when adding a new route. We managed to link those to a scenario where the new route also introduces a new cluster. When the new cluster is added it enters the warming state. As we send the routes immediately after sending clusters, there is a short period of time when requests are being routed to the cluster that is still in the warming state. As a consequence, 503 responses are returned.
We understand we don't respect following requirement from [1]: "management plane must ensure that clusters referenced by a route are in place, before pushing the updates for a route.", but it's not clear to us what is the correct protocol for determining if "clusters referenced by a route are in place".
In our experiments it looks like XDS ACKs to cluster update messages were sent from Envoy before cluster warming period is done, making them unsuitable for the purpose of detecting cluster readiness. Can we get a confirmation/refutation of this behavior?
If this is indeed the case, we would like to propose an extension of the XDS protocol to support a readiness feedback for the resources that require warming up (e.g. CDS and LDS).
[1] https://github.com/envoyproxy/data-plane-api/blob/master/XDS_PROTOCOL.md#eventual-consistency-considerations