envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.05k stars 4.82k forks source link

ext_proc/ext_authz / async client: Better observability #32250

Open rshriram opened 9 months ago

rshriram commented 9 months ago

When connection to the external server fails, all we see in ext_proc or ext_authz is grpc unavailable or something simple like this. It is not sufficient to give the end user any hint on what actually went wrong. Our users often fail to setup TLS on the grpc server side. When the extension calls fail, the grpc unavailable status is insufficient to tell them where to go look. Its often a guess work.

Similarly, if a connection could not be established due to firewall rules, all we see is unavailable.

It would be good to have some error string straight from boring SSL that informs the user that this was a SSL error (we do not have to process that string. Just expose it as it is). In the same vein, for firewall issues, if the user sees "connection refused" [which is what the kernel gives us], it is a sufficient hint for them to go look at their server to debug the issue.

Given that both ext_proc and ext_authz use the grpc async client, it would be good to expose this info so that it can be extracted and logged into the filter_metadata in ext_proc/ext_authz and thus dumped into access logs in Envoy.

cc @tyxia @htuch @jbohanon @wbpcode etc.

alyssawilk commented 9 months ago

cc @esmet @tyxia @ggreenway

bvandewalle commented 9 months ago

As we are struggling with exactly the same thing (ext_authz failing without clear indication of why), this would be a great enhancement. +1

ggreenway commented 9 months ago

@rshriram makes sense to me. I'd say put it behind a config knob, and by default don't send that type of information in case people are worried about leaking internal details.

wbpcode commented 9 months ago

@rshriram makes sense to me. I'd say put it behind a config knob, and by default don't send that type of information in case people are worried about leaking internal details.

Or could we just provide a more detailed response code details or response flags (now custom response flags was landed and we can use it to record multiple events in the processing of request). Then it will not be leaked to client by default. And if we want expose the error to the client, we can do it by configuring a simple response_headers_to_add (%RESPONSE_CODE_DETAILS%)?

ggreenway commented 9 months ago

@wbpcode I like that idea!

rshriram commented 9 months ago

does async client have response flags set? along with response details?

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.