Open tanujd11 opened 1 year ago
Hey @tanujd11 from a user perspective can you share what you like to happen on the data plane ( from gateway to multiple backend endpoints with different topology info )
I understand this is very useful for optimizing East West traffic within a cluster, is that also the case for north South ?
I think for an Envoy gateway running in us-east-1/us-east-1a should prefer the same zone backend to prevent cross zonal traffic. I think this behaviour could be made as default as cross zone communication is obviously costly. WDYT?
thanks, here's something more to think about
service.kubernetes.io/topology-mode
This issue has been automatically marked as stale because it has not had activity in the last 30 days.
This issue has been automatically marked as stale because it has not had activity in the last 30 days.
This issue has been automatically marked as stale because it has not had activity in the last 30 days.
there's a new field in the Service spec (trafficDistribution.preferClose) https://kubernetes.io/docs/concepts/services-networking/service/#traffic-distribution that we could consider using to automate priority amongst endpoints within a Service
there's a new field in the Service spec (trafficDistribution.preferClose) https://kubernetes.io/docs/concepts/services-networking/service/#traffic-distribution that we could consider using to automate priority amongst endpoints within a Service
Could be an option when this new field is stable and corresponding K8s version is adopted by massive companies.
Before that, IMO it's better to do load balancing accross endpoints in the cluster via Envoy's capability.
Currently EG has implemented locality weighted load balancing ^1, one BackendRef
is translated to one LocalityLbEndpoints
.
locality := &endpointv3.LocalityLbEndpoints{
Locality: &corev3.Locality{
Region: fmt.Sprintf("%s/backend/%d", clusterName, i),
},
LbEndpoints: endpoints,
Priority: 0,
}
// Set locality weight
var weight uint32
if ds.Weight != nil {
weight = *ds.Weight
} else {
weight = 1
}
Actually endpoints inside a LocalityLbEndpoints
may be running in different zone, cross zone cost can't be saved in this way.
Through Envoy's capability, priority levels ^2 or zone aware routing ^3 can archive the goal to save cross zone cost.
topology.kubernetes.io/zone
label.--service-zone
option, means which zone Envoy Pod is running in.0
, else 1
.This approach is mutually exclusive with locality weighted load balancing, since in the case of locality aware LB, we rely on the management server to provide the locality weighting, rather than the Envoy-side heuristics used in zone aware routing.
topology.kubernetes.io/zone
label.--service-zone
option, value meaning which zone Envoy Pod is running in.cluster_manager. local_cluster_name
, means which fleet Envoy Pod belongs to, it will be irKey
in implementation.cluster_manager. local_cluster_name
to CDS resources.cluster_manager. local_cluster_name
as endpoints and add them to EDS resources.Since step 1 and 2 is required by both, priority levels can work with implemented locality weighed load balancing, but zone aware routing can't. Apparently priority levels are easier to implement. But it requires EDS resources should be arranged in xds/cache
module for individual Envoy. No matter EG do this, or create new xDS Hook API, like PostEndpointModify(ClusterLoadAssignment, Node)
which allow extension server to do this.
thanks for outlining the steps @aoledk ! we currently have https://github.com/envoyproxy/gateway/issues/3055 open to get explicit priority per backendRef and program that into the xds cluster resource.
In the future, we can use this issue to make sure we track the auto priority work, the field in k8s preferClose
could be the knob for users to say they want to opt in to this feature
Hi @aoledk, regarding:
priority levels [...] EG rearranges EDS resources for each Envoy, if Envoy and Backend endpoint are in same zone, priority as 0, else 1.
Is this option viable? Can our XDS server produce different EDS for different envoy pods that are part of the same Envoy deployment?
I think it's possible. xDS server can read the locality info of envoy node.
The cache will be keyed based on a pre-defined hash function whose keys are based on the Node information.
// Identifies a specific Envoy instance. Remote server may have per Envoy configuration.
message Node {
// An opaque node identifier for the Envoy node. This must be set.
string id = 1;
// The cluster that the Envoy node belongs to. This must be set.
string cluster = 2;
google.protobuf.Struct metadata = 3;
Locality locality = 4;
// This is motivated by informing a management server during canary which
// version of Envoy is being tested in a heterogeneous fleet.
string build_version = 5;
}
Thanks for pointing that out @modatwork. My other concerns wrt. to this approach are:
In general:
Is there a reason to prefer the Priority-based approach? I'm not sure that it's significantly simpler than enabling zone-aware routing.
is @modatwork the same person as @aoledk :) ?
Possible impact on memory consumption if we have to maintain a copy of the cache for each locality. Not sure if that's already the situation today. @arkodg - do you know?
@guydc we have are dumuxing on gateway/IR, with locality
it would add another dimension lookup and would increase memory by num localities total (xds per gateway gateway resources)
@arkodg I work together with @modatwork
This issue has been automatically marked as stale because it has not had activity in the last 30 days.
hey @aoledk , adding this issue to the v1.2 milestone, is this something you can help with ?
TrafficDistribution
set to PreferClose
https://kubernetes.io/docs/concepts/services-networking/service/#traffic-distribution, lets rearrange the EDS endpoint (so Service opts in)hey @aoledk , adding this issue to the v1.2 milestone, is this something you can help with ?
- Lets configure zone aware routing in envoy by default https://www.envoyproxy.io/docs/envoy/latest/faq/configuration/zone_aware_routing
- If a Service has
TrafficDistribution
set toPreferClose
https://kubernetes.io/docs/concepts/services-networking/service/#traffic-distribution, lets rearrange the EDS endpoint (so Service opts in)
@arkodg I can help.
awesome thanks @aoledk !
This issue has been automatically marked as stale because it has not had activity in the last 30 days.
hey @aoledk still planning on working on this one for v1.2 ?
hey @aoledk still planning on working on this one for v1.2 ?
Hi @arkodg nowadays I'm working on bring in EG v1.1., next month I will continue on this feature, but not sure whether it can be merged into v1.2 (Due by October 30, 2024), maybe v1.3.
thanks for the update @aoledk, let us know if you hit any issues while running EG v1.1 moving this issue into backlog
@arkodg LGTM.
This issue has been automatically marked as stale because it has not had activity in the last 30 days.
Description: Implement locality based routing support by default in EG. Now that we we can have individual endpoints as backend to EG. Can we support region/zone/subzone based routing based on EndpointSlice information, node labels etc.?