Open bmcalary-atlassian opened 1 year ago
@alyssawilk
Can I get a bit more context on how and why you want to do this?
I don't think it's doable today and generally it's bad form for a single stream on a multiplexed connection to be able to force connection drain, but if you've got a valid use case I have ideas on how such a feature could be wired up!
@alyssawilk thanks so much for getting back to me.
Yup, agree its bad form.
Let me give you some context :) I apologise for the wall of text, but I want to give you the best possible answer.
We're using Envoy as an edge proxy facing the internet.
Envoy proxies client requests to various clusters (applications). These applications have a multi-tenanted model, in the sense that the one application/cluster serves many different organizations (hundreds of thousands).
Organizations can provide a list of their own corporate office, VPN and Data Center source IPs which should be able to access their "tenancy". Attempts to access their tenancy from outside those IP addresses today result in the backend application returning a HTTP 403. All pretty simple.
To make it absolutely clear: the per-Organization/Tenant IP allow-listing logic is not performed on Envoy. It's performed by our application layer by inspecting a sanitized
X-Forwarded-For
header, and a HTTP 403 with a nice dynamic HTML or JSON response is simply proxied by Envoy back to the client. Envoy's involvement is only as a simple HTTP reverse proxy.
However, a problem emerges when clients moving from their office to home leave a browser window open with our application in the active tab, then onto VPN, or wake from sleep and then reconnect to their VPN a few moments later or the user simply forgets to activate their VPN before attempting to open our application: The browser creates a TCP connection to our Envoy based edge from the client's home internet IP on their local router - a non-allowlisted IP - and our backend application (rightly) returns a HTTP 403. So far so good.
The problem is that browsers (Firefox, Safari, Chrome) and Operating Systems (Windows, Macos) hold on to this connection as a HTTP Keep-Alive Connection even after clients (re)connect to their VPN. A HTTP 403 is not a connection ending response code. So when clients DO reconnect to their VPN and refresh the page, the browser, OS and their home router forward the request over the original TCP connection, from the same disallowed source IP address.
In our testing and observations, browsers perform idle connection pooling across tabs and refreshes, so it is not sufficient for clients/users to simply hit refresh, CMD/CTRL+R or F5, or open a new tab, instead they must be instructed to perform a CMD/Ctrl+Shift+R or close and re-open all browser windows. This is an even harder UX problem with mobile clients.
Also in our testing, most browsers will form a new TCP connection after receiving a HTTP 421 - however many client libraries and front-end libraries are not prepared to correctly interpret such a response code - particularly mobile ones.
Finally, in our testing and research of major browsers, it appears they DO reset all their connections when traditional VPNs touch the default route 0.0.0.0/0 (after adding a more specific route to the VPN concentrator's public IP via the original default gateway), but this is decreasing in popularity. Increasingly, VPN solutions like Zscaler and many others leave the default route untouched, and simply add more specific routes for given applications. Browsers currently do not detect this new approach as a trigger to reset their connection pool.
Because getting Google and Mozilla to change their connection pooling logic to detect new VPN routes would have an exceptionally long lead time (if they picked it up at all), we're hoping to find a way via Lua, Extproc, or native envoy configuration (e.g virtual host/route) to tell Envoy to close a client connection after proxying a 403 or maybe a X-Envoy-Close-Downstream-After: true
response header immediately after delivering a response. (plus potentially a X-Envoy-Close-Downstream-Immediate: true
could also be valuable, for shedding clients - such as abusive ones - without processing any response).
Hope the above is clear.
oh fascinating. So basically if you have an H2 connection to an Envoy before you do your VPN logic, you end up in an endless 403 loop because the browser keeps the legacy connections alive so never establishes them from the new address? Sounds like a totally legit use case to me :-)
so there's already logic in the HCM to handle draining connections, which can be triggered by hitting max configured requests per connection, failing health checks etc. you'd basically want to wire something up something like the logic here: https://github.com/envoyproxy/envoy/blob/main/source/common/http/conn_manager_impl.cc#L369 where a stream could configure a drain, and indicate the drain close timeout (which sends goaways for H2 and sets connection: close for H1), and you should be good to go
You've understood exactly!
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
I need the same feature. My use case is the following: I have different edge node, each of them running an instance of Envoy as Front proxy. User should connect every time to the nearest edge node. User have a domain that, resolved using DNS, point to the nearest edge node. When nearest edge node changes, I want the client to terminate TCP connection and do DNS lookup again to obtain new IP it should connect to.
So I need a (per route) way to terminate downstream TCP connection. I tried trying to add manually Connection: close header to response, but the header is not added both using response_header_to_add or a custom Lua script. Both methods work for other headers, so it's like envoy blocks the possibiliy to change Connection header. This method works only for HTTP/1.1. A specific configuration to downstream connection can be useful.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
Keep open.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
Keep open.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
keep open
We're looking for a way to force connections to close after fully sending a particular reply, e.g. a 403 or 421, forcing downstream clients to re-connect - even if they have a misbehaving client. Note: I am NOT referring to draining the listener. The listener will remain active indefinitely.
Based on my understanding the h1.1
Connection: Close
response header is no longer respected by h2 clients and I can't find a way to tell Envoy to send a GOAWAY programmatically.Previous proxy software I've worked with had the option to send a response and then optionally discard the TCP connection - basically send a FIN just after writing out the response.
Is there some way to do this in Envoy which I am missing, e.g. in Lua?
Again, we're looking for some way to tell Envoy to FIN the downstream client connection after delivering a particular response (by response code or checking for a response header from our backend.