envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.26k stars 4.68k forks source link

Enable fallback_policy when no healthy host in subset #34798

Open colek319 opened 1 week ago

colek319 commented 1 week ago

Feature

A bool like fallback_if_no_healthy_host in the lb_subset_config, which causes host selection to use the chosen fallback_policy, or lb_subset_metadata_fallback_policy when no healthy host is available in the selected subset. (In our case, it would use the ANY_ENDPOINT fallback). This isn't very flexible, but preserves the behavior for current users, and also enables our use case.

Motivation

By design, when a selected LB subset is unhealthy, host selection does not fallback to another subset.

This makes sense when subsets are meant to be completely isolated, but in our use case, we prefer falling back to another subset. We are using weighted clusters to configure 5% of our load to route to canary, and when no canary is healthy, we want host selection to consider the other subsets.

Are there any ideas in progress for supporting this type of behavior? The reasoning in this reply makes sense, but it would be great if there were more flexibility in how subsets behave.

colek319 commented 1 week ago

I think https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/cluster/v3/cluster.proto#envoy-v3-api-enum-config-cluster-v3-cluster-lbsubsetconfig-lbsubsetmetadatafallbackpolicy might apply when host selection fails. So I will try this first.