Open snowp opened 3 years ago
That sounds useful. It may be better to make it an enum instead of a string, in case we want to do something like add an access-log flag for some values.
This would be really useful. Today "no healthy upstream" can mean a variety of things:
1) All the upstreams were failing health checks (if panic mode is disabled) 2) There were no upstreams at all (e.g. service discovery issue) 3) subset selector did not match any upstreams
There might be more I'm not thinking of, but if we can distinguish those cases it would save a lot of debugging time!
The main rationale for a string was better supporting custom clusters/load balancers that might want to enrich the information, but perhaps it would be better with an enum capturing the main failure reasons and potentially exposing mutable filter state/metadata to the lb down the line if we actually have a use case for extensions wanting to record more free form failure reasons.
Currently we assume that if a load balancer implementation returns
nullptr
that this meansno healthy upstream
. For load balancers like the subset lb, it could also mean that no host existed that satisfied the requirement (i.e. no subset found). It would be helpful to provide some mechanism to surface the failure reason, which could be included in some way in the local reply.This would in turn allow creating local reply mappings based on it, making it possible to have Envoy respond with errors such as
no subset found for x-subset-header: <value>
or similar.The simplest way of doing this would probably be to update the
chooseHost
signature to also return an optional error string, allowing the lb to include additional information when relevant.