Open Brunomachadob opened 4 years ago
cc. @derekargueta who recently touched codes related to this.
👀 my pr hasn't landed yet but I can help investigate
So there's 2 main observations I have so far.
shouldn't the request mirror get the host=api-mirror-shadow?
Not quite, because the way that shadowing works is that it takes an exact clone of the original request, and just adds -shadow
as a suffix to the host, and doesn't do its own host resolution for the api-mirror
cluster (code here) regardless of auto_host_rewrite
. We can definitely update the docs to clarify that. So what's happening here is the auto-rewrite kicks in to rewrite the host to api
on the main request, then the ShadowWriter
takes that host string as-is and just adds -shadow
. The hostname of the mirroring cluster is not considered or looked up.
Maybe this is some caching problem in the host resolution?
My hypothesis so far:
From observing -l trace
level logs and looking at /stats
on the admin server, it seems that this only happens if you hit an Envoy worker thread doesn't already have a connection to upstream, and has to establish a new connection (each worker has its own connection pool). After hitting an established connection you don't have this problem.
A few important facts: 1) The mirrored request is queued immediately once the entire request is parsed. (link) 2) host resolution is an asynchronous event, requiring a callback to be triggered (link). 3) Envoy needs to resolve the host for every new connection is establishes upstream.
So I think what we are seeing here is:
1) Envoy receives a new request on a worker thread that doesn't have an upstream connection
2) Envoy initiates an upstream connection, and before the connection is established (which is prerequisite to getting the hostname in a callback), it also queues the mirrored request in the cluster manager with a copy of whatever it has at that time - in this case localhost:5000
since the connection callback hasn't been triggered. (attempts to call doRequestComplete()
after parsing headers/body/trailers, which in turn is what executes the mirrored request).
4) Connection callback is triggered, which sets the host rewrite (code link) so the main request has the right host, but there's no bridge to the mirrored request - it's already been queued elsewhere in the system, so when it gets sent it only has the original localhost:5000
to work with.
Then on subsequent calls to the same upstream connection the host is immediately resolved because the connection is already open, so the host rewrite is set before the mirrored request is queued in the cluster manager
Still gathering supporting evidence for the above, but if true means that either 1) we document this as expected behavior or 2) fix it so that the mirrored request also waits for onPoolReady
(and not just "once the request is fully read) so that they have the expected host when combined with auto_host_rewrite
on the first request of a new connection.
Thank you for the quick and detailed answer!
Regarding the cluster used in the host I'll try to wrap my head around it and maybe open a SO/issue with some more questions if needed.
Regarding the mirror host resolution Thanks for clarifying it, I think it's not going to be a huge problem for me, at least for now. Let me know if there is something else I can do to help on this
Thanks!
Yeah @derekargueta your assessment is correct. We rewrite the host after we get an upstream host. Note though that we are not waiting for DNS resolution, just connection creation. DNS resolution is async. I think we could likely fix this by doing shadowing after pool complete like you said.
@derekargueta @mattklein123 - making the shadow request wait till the primary cluster's upstream host connection is ready might introduce unnecessary dependency and potentially unwanted latency on the shadow request. In https://github.com/envoyproxy/envoy/issues/9094 it was suggested to add disable_shadow_host_suffix_append
under request_mirror_policies
entries. How about we add auto_host_rewrite
along with that? After all, it will be helpful to use the mirror cluster's hostname for the Host header in the shadow request instead of the primary cluster's hostname. What do you think?
If my understanding is correct, could you point me to the code that does "Connection callback is triggered, which sets the host rewrite". I think the code link in @derekargueta message above is now obsolete. I can look into making these changes. Thanks.
If my understanding is correct, could you point me to the code that does "Connection callback is triggered, which sets the host rewrite". I think the code link in @derekargueta message above is now obsolete. I can look into making these changes. Thanks.
Sure the above makes sense to me. Here is the shadow writer code: https://github.com/envoyproxy/envoy/blob/main/source/common/router/shadow_writer_impl.h
Are there any updates?
Ignoring auto_host_rewrite
parameter by mirror
module seems to be a critical problem for cases when envoy sends duplicate request to somewhere with TLS, or for example to k8s. When host
field in header becomes, in fact, invalid (add -shadow
).
Adding auto_host_rewrite
parameter for mirror
sounds like a real solution to the problem, if the host
field is overwritten correctly.
Pump! Are these any updates on this issue?
adding auto_host_rewrite
to mirroring sounds like a reasonable approach, if sounds good to @envoyproxy/api-shepherds I'll take a stab at this next week.
Description: I'm currently trying to use
request_mirror_policy
in a route that is also usingauto_host_rewrite
. The problem are thehost/:authority
headers in the mirrored request is giving me weird results...Repro steps: I created a simplified version of my setup in this gist including a full log of execution
Mock server logs 2 requests
Calling again...
Mock server logs 2 requests
From now on it's always successfully rewritten, until I restart the containers and the same thing happen. Maybe this is some caching problem in the host resolution?
Tested versions
envoyproxy/envoy:v1.10.0
envoyproxy/envoy-dev:daeb9850dbcd78da21a2e8ff9854564830cefb19 # 1.13.0-dev
Additional question As you can see in my
envoy.yaml
I want to mirror the request to another cluster with another domain (I'm migrating one feature from a monolith to a microservice).So in my case the real request is going to
api
, and the mirror request is going to another cluster calledapi-mirror
. In this case, shouldn't the request mirror get thehost=api-mirror-shadow
?The documentation is not really clear (at least to me) which cluster is going to be used when rewriting, the mirror cluster or the original one?
Thanks for the help!