envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.28k stars 4.69k forks source link

Outlier Detection for non-error status codes #18789

Open anikamukherji opened 2 years ago

anikamukherji commented 2 years ago

Title: Support outlier detection of other status codes (particularly 4xx).

Description: Outliers can be hosts returning an abnormal rate of any status code, not just 5xx. Although 4xx errors are generally considered client errors, if a host starts returning a large number of 4xx, it may signal it has some problem (possibly related to authz, authn, etc) and should be considered an outlier. At Pinterest, we are interested in being able to identify 4xx outliers in addition to 5xx outliers (although I can imagine this could have a general solution for all 300+ status codes).

[optional Relevant Links:]

Any extra documentation required to understand the issue.

snowp commented 2 years ago

Seems reasonable to me, I don't think this would be that hard to do

cpakulski commented 2 years ago

Sounds good. I will implement this. I think that the API should be extended to define status codes considered as errors, so one can specify exact codes which will cause a node to be considered an outlier.

gauravojha commented 2 years ago

@cpakulski wanted to check if there are any plans to support the above? this would be amazingly helpful feature..

" I think that the API should be extended to define status codes considered as errors, so one can specify exact codes which will cause a node to be considered an outlier."

this will be really helpful, for cases like lets say we want to eject for all 5xx except 502 for some reason or something like that if required 🙏

cpakulski commented 2 years ago

@gauravojha I still plan to work on this. Your example with excepting 502 is a very good point. Please keep an eye on this issue and I should land a PR within few weeks.

nzt4567 commented 7 months ago

@cpakulski Any progress on this pls? 🙂

cpakulski commented 7 months ago

I wrote a proposal and coded working prototype some time ago. Then it was put on hold but I plan to open a formal PR within next month.