envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.01k stars 4.81k forks source link

Mirrored request with host rewrite has wrong Host header #9326

Open Brunomachadob opened 4 years ago

Brunomachadob commented 4 years ago

Description: I'm currently trying to use request_mirror_policy in a route that is also using auto_host_rewrite. The problem are the host/:authority headers in the mirrored request is giving me weird results...

Repro steps: I created a simplified version of my setup in this gist including a full log of execution

docker-compose up --build
curl localhost:5000/any/host_rewrite

Mock server logs 2 requests

# The actual request
mockserver_1    |   {
mockserver_1    |     "method" : "GET",
mockserver_1    |     "path" : "/any/host_rewrite",
mockserver_1    |     "headers" : [ {
mockserver_1    |       "name" : "host",
mockserver_1    |       "values" : [ "api" ] # successfully rewritten
mockserver_1    |     }, {
....
# The mirror request
mockserver_1    |   {
mockserver_1    |     "method" : "GET",
mockserver_1    |     "path" : "/any/host_rewrite",
mockserver_1    |     "headers" : [ {
mockserver_1    |       "name" : "host",
mockserver_1    |       "values" : [ "localhost-shadow:5000" ] # Why localhost?
mockserver_1    |     }, {
....

Calling again...

curl localhost:5000/any/host_rewrite

Mock server logs 2 requests

# The actual request
mockserver_1    |   {
mockserver_1    |     "method" : "GET",
mockserver_1    |     "path" : "/any/host_rewrite",
mockserver_1    |     "headers" : [ {
mockserver_1    |       "name" : "host",
mockserver_1    |       "values" : [ "api" ] # successfully rewritten
mockserver_1    |     }, {
# The mirrored request
mockserver_1    |   {
mockserver_1    |     "method" : "GET",
mockserver_1    |     "path" : "/any/host_rewrite",
mockserver_1    |     "headers" : [ {
mockserver_1    |       "name" : "host",
mockserver_1    |       "values" : [ "api-shadow" ] # successfully rewritten

From now on it's always successfully rewritten, until I restart the containers and the same thing happen. Maybe this is some caching problem in the host resolution?

Tested versions envoyproxy/envoy:v1.10.0 envoyproxy/envoy-dev:daeb9850dbcd78da21a2e8ff9854564830cefb19 # 1.13.0-dev

Additional question As you can see in my envoy.yaml I want to mirror the request to another cluster with another domain (I'm migrating one feature from a monolith to a microservice).

So in my case the real request is going to api, and the mirror request is going to another cluster called api-mirror. In this case, shouldn't the request mirror get the host=api-mirror-shadow?

The documentation is not really clear (at least to me) which cluster is going to be used when rewriting, the mirror cluster or the original one?

Thanks for the help!

dio commented 4 years ago

cc. @derekargueta who recently touched codes related to this.

derekargueta commented 4 years ago

👀 my pr hasn't landed yet but I can help investigate

derekargueta commented 4 years ago

So there's 2 main observations I have so far.

shouldn't the request mirror get the host=api-mirror-shadow?

Not quite, because the way that shadowing works is that it takes an exact clone of the original request, and just adds -shadow as a suffix to the host, and doesn't do its own host resolution for the api-mirror cluster (code here) regardless of auto_host_rewrite. We can definitely update the docs to clarify that. So what's happening here is the auto-rewrite kicks in to rewrite the host to api on the main request, then the ShadowWriter takes that host string as-is and just adds -shadow. The hostname of the mirroring cluster is not considered or looked up.

Maybe this is some caching problem in the host resolution?

My hypothesis so far: From observing -l trace level logs and looking at /stats on the admin server, it seems that this only happens if you hit an Envoy worker thread doesn't already have a connection to upstream, and has to establish a new connection (each worker has its own connection pool). After hitting an established connection you don't have this problem.

A few important facts: 1) The mirrored request is queued immediately once the entire request is parsed. (link) 2) host resolution is an asynchronous event, requiring a callback to be triggered (link). 3) Envoy needs to resolve the host for every new connection is establishes upstream.

So I think what we are seeing here is: 1) Envoy receives a new request on a worker thread that doesn't have an upstream connection 2) Envoy initiates an upstream connection, and before the connection is established (which is prerequisite to getting the hostname in a callback), it also queues the mirrored request in the cluster manager with a copy of whatever it has at that time - in this case localhost:5000 since the connection callback hasn't been triggered. (attempts to call doRequestComplete() after parsing headers/body/trailers, which in turn is what executes the mirrored request). 4) Connection callback is triggered, which sets the host rewrite (code link) so the main request has the right host, but there's no bridge to the mirrored request - it's already been queued elsewhere in the system, so when it gets sent it only has the original localhost:5000 to work with.

Then on subsequent calls to the same upstream connection the host is immediately resolved because the connection is already open, so the host rewrite is set before the mirrored request is queued in the cluster manager


Still gathering supporting evidence for the above, but if true means that either 1) we document this as expected behavior or 2) fix it so that the mirrored request also waits for onPoolReady (and not just "once the request is fully read) so that they have the expected host when combined with auto_host_rewrite on the first request of a new connection.

Brunomachadob commented 4 years ago

Thank you for the quick and detailed answer!

Regarding the cluster used in the host I'll try to wrap my head around it and maybe open a SO/issue with some more questions if needed.

Regarding the mirror host resolution Thanks for clarifying it, I think it's not going to be a huge problem for me, at least for now. Let me know if there is something else I can do to help on this

Thanks!

mattklein123 commented 4 years ago

Yeah @derekargueta your assessment is correct. We rewrite the host after we get an upstream host. Note though that we are not waiting for DNS resolution, just connection creation. DNS resolution is async. I think we could likely fix this by doing shadowing after pool complete like you said.

scheler commented 3 years ago

@derekargueta @mattklein123 - making the shadow request wait till the primary cluster's upstream host connection is ready might introduce unnecessary dependency and potentially unwanted latency on the shadow request. In https://github.com/envoyproxy/envoy/issues/9094 it was suggested to add disable_shadow_host_suffix_append under request_mirror_policies entries. How about we add auto_host_rewrite along with that? After all, it will be helpful to use the mirror cluster's hostname for the Host header in the shadow request instead of the primary cluster's hostname. What do you think?

If my understanding is correct, could you point me to the code that does "Connection callback is triggered, which sets the host rewrite". I think the code link in @derekargueta message above is now obsolete. I can look into making these changes. Thanks.

mattklein123 commented 3 years ago

If my understanding is correct, could you point me to the code that does "Connection callback is triggered, which sets the host rewrite". I think the code link in @derekargueta message above is now obsolete. I can look into making these changes. Thanks.

Sure the above makes sense to me. Here is the shadow writer code: https://github.com/envoyproxy/envoy/blob/main/source/common/router/shadow_writer_impl.h

Schtil commented 1 year ago

Are there any updates? Ignoring auto_host_rewrite parameter by mirror module seems to be a critical problem for cases when envoy sends duplicate request to somewhere with TLS, or for example to k8s. When host field in header becomes, in fact, invalid (add -shadow). Adding auto_host_rewrite parameter for mirror sounds like a real solution to the problem, if the host field is overwritten correctly.

vietanhduong commented 6 months ago

Pump! Are these any updates on this issue?

derekargueta commented 5 months ago

adding auto_host_rewrite to mirroring sounds like a reasonable approach, if sounds good to @envoyproxy/api-shepherds I'll take a stab at this next week.

https://github.com/envoyproxy/envoy/blob/eda7d32bd54d95008b82f2cc7327d63dbd887df0/api/envoy/config/route/v3/route_components.proto#L777