Closed konstantin-baidin-y42 closed 1 month ago
cc @esmet @tyxia @ggreenway
Seems likely that these error codes are getting misinterpreted as a denied response, or hitting some alternate error path from the normal one.
Hello, I tested it manually with logging level tracing and found that responses with error status codes are processed in the GrpcClientImpl::onSuccess
callback instead of the GrpcClientImpl::onFailure
. So I added a check of the status code and similar actions as in the onFailure callback.
I made a PR with a draft of the fix. Can you please take a look? https://github.com/envoyproxy/envoy/pull/34951
Since my fix was reverted in https://github.com/envoyproxy/envoy/pull/35637, can you please re-open this issue again? @ggreenway Or should I create a duplicate one?
Also, do I understand right that this time I should create the same PR, just with the runtime guard to be false by default as I did initially?
You'd need to make it with a config setting to control the behavior, which defaults to the old behavior. Runtime flags are meant to be temporary.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.
Title:
failure_mode_allow=true
is not fully supported for the grpc authz server.Description: There is inconsistency in how the
failure_mode_allow=true
option is applied to external authorization servers depending on the protocol: http and grpc.In case of http authz server, any 5xx statuses lead to accepting the incoming request (fail-open).
In case of grpc authz server, statuses like 14 (Unavailable) lead to rejecting the incoming request. If there is any grpc proxy between envoy and authz server, then this proxy can return 14 (Unavailable) when authz server is unavailable. It leads to the fail-close behaviour even when
failure_mode_allow
is enabled. As a result, we have random rejections while withfailure_mode_allow
we want the opposite - don't reject requests because of random failures.Expected behaviour is to use failue_mode_allow when receiveng grpc statuses like 2 (Unknown), 13 (Internal), 14 (Unavailable).
The following PR https://github.com/envoyproxy/envoy/pull/4199 was intended to handle these cases when
failure_mode_allow=true
and authz server is unhealthy, but it was solved for HTTP authz server only.Repro steps:
failure_mode_allow=true