Closed hbagdi closed 5 months ago
There can be an argument that route's properties should take precedence.
The reason the expected behavior here makes more sense is because upstream and service are more "closer" to the concept of service than a route. There can be multiple routes in Kong forwarding traffic to the same upstream service and it is the Service owner who gets to decide how it wants to receive traffic from Kong.
This is one of those things where route entity being used not just for matching but also for manipulating requests creates a problem.
Hey, @locao , you added host_header in #4959. Do you have an opinion on this one?
This is one of those things where route entity being used not just for matching but also for manipulating requests creates a problem.
Well, isn't it just preventing the manipulation?
Isn't it more flexible to have it like it is? preserve_host=false
by default, and if upstream.host_header
overrides that, then the preserve_host=true
does nothing. With current behavior you can create route
without preserve host and it then uses upstreams host_header. But if you really want to keep original request's host header, you can preserve that.
Service owner can decide that all the routes created to my service needs to have preserve_host=false?
, no?
Service owner can decide that all the routes created to my service needs to have preserve_host=false?, no?
They certainly could. An upstream is "closer" and is more concrete when you think about an "Upstream service" in Kong and hence I feel the current behavior is incorrect.
In my point of view, if user defined the Host
header that must be used for that route, that should be respected. An upstream having a host defined is a normal/common configuration, a route overwriting a host header is a specific behavior, so I would guess that when users set the host header for that route, they really want that route to have distinct behavior.
On the other hand, I also agree with:
This is one of those things where route entity being used not just for matching but also for manipulating requests creates a problem.
I don't like ambiguity, I feel like the best approach to this would be removing the preserve_host
option at all.
Yes, host_header
got added some 4 months ago, while preserve_host
has been there from a day one, and yes they sure feel a bit conflicting.
I understand "the upstream is closer", but people have proposed us to add all the service fields to route entity so that they could adjust some service params by route. And with this it only makes sense to move this closer to client. But we have refused to do so. The preserve host is indeed weird exception.
in general, in the line of "route", "service", "upstream", it goes from specific to generic. SO from that perspective it would make sense that a route-property would have a higher precedence than an upstream-property. But that is in general.
In this case, I think it would help to create a matrix of the options that define the host-header (outgoing), based on the input parameters;
route.preserve_host
service.host
upstream.host_header
target
So imo this is what should happen (I added the healthcheck host, since in that case there will be no incoming Host-header):
incoming Host-header | route.preserve_host |
service.host |
upstream.host_header |
target |
upstream Host-header | (health-check host) |
---|---|---|---|---|---|---|
foo.com | true | bar.com | n.a. | n.a. | foo.com | n.a. |
foo.com | true | [upstream] | myhost.com | internal.com | foo.com | myhost.com |
foo.com | true | [upstream] | nil |
internal.com | foo.com | internal.com |
foo.com | false | bar.com | n.a. | n.a. | bar.com | n.a. |
foo.com | false | [upstream] | myhost.com | internal.com | myhost.com | myhost.com |
foo.com | false | [upstream] | nil |
internal.com | internal.com | internal.com |
The whole argument around specific to generic is flawed as far as I can see.
We evolved from the api
entity in our model to routes
and services
but didn't segregate the upstream and downstreams concerns enough. Route matching an request manipulations are different concerns.
While preserve_host
and strip_path
exist and they probably need to exist for a long time for backwards compatibility, we should not be afraid of inverting the behavior. API gateway should transparently take care of handling the conversion of requests from a foo.external.com
to an application that's running internally in the network at bar.internal.com
.
On another point, I might be missing something here but having a different upstream host-header and a different health-check host header is extremely confusing without any usability benefit at all. It would introduce more confusion from an observability standpoint.
There can and will be weird corner cases here, but at that point, we have to tell the user that you are doing something wrong and should update your application (or at that point, use a plugin if you really want to) to use a specific host header.
API gateway should transparently take care of handling the conversion of requests from a foo.external.com to an application that's running internally in the network at bar.internal.com.
But it does, doesn't it? set preserve_host
to false, and it does exactly that?
On another point, I might be missing something here but having a different upstream host-header and a different health-check host header is extremely confusing without any usability benefit at all. It would introduce more confusion from an observability standpoint.
No way to get around this. Health-probes are not based on requests, so the host-header info is simply not available when the timer runs. If you match on "path" for example, you don't even know all possible incoming hostnames, so impossible to make this work.
Route matching an request manipulations are different concerns.
This is a fair comment. Though not sure whether the required changes would be justified. What specific cases do you have that you currently cannot deal with?
No way to get around this. Health-probes are not based on requests, so the host-header info is simply not available when the timer runs. If you match on "path" for example, you don't even know all possible incoming hostnames, so impossible to make this work.
This is exactly why I'm recommending that the upstream should respond on a single virtual host to all requests, no matter it is a proxy request or a health-check request. If we don't do that differentiation, then the upstream is available at a specific virtual host and kong can transparently handle other host names. The app doesn't even need to be aware of other host names.
Now, for some weird reason, you do want that behavior, then unset the host_header in your upstream and use preserve_host behavior.
I don't think it is weird for single upstream to answer: customer-a.app.com AND customer-b.app.com
Aka *.app.com. But yes, the whole strip_path
and preserve_host
are things I don't like that much, but they have been with the product for very long, while upstream.host_header
is a new one.
in general, in the line of "route", "service", "upstream", it goes from specific to generic. SO from that perspective it would make sense that a route-property would have a higher precedence than an upstream-property. But that is in general.
In this case, I think it would help to create a matrix of the options that define the host-header (outgoing), based on the input parameters;
- incoming Host-header
route.preserve_host
service.host
upstream.host_header
target
So imo this is what should happen (I added the healthcheck host, since in that case there will be no incoming Host-header):
incoming Host-header
route.preserve_host
service.host
upstream.host_header
target
upstream Host-header (health-check host) foo.com true bar.com n.a. n.a. foo.com n.a. foo.com true [upstream] myhost.com internal.com foo.com myhost.com foo.com true [upstream]nil
internal.com foo.com internal.com foo.com false bar.com n.a. n.a. bar.com n.a. foo.com false [upstream] myhost.com internal.com myhost.com myhost.com foo.com false [upstream]nil
internal.com internal.com internal.com
I am a little confused about your matrix which first case's service.host
is bar.com
and the second one's is [upstream]
. I guess the meaning of [upstream]
is that service.host property is equal to upstream.name property , but what does you mean in the first case ?
I thought service.host is always equal to upstream.name , is not that ?
Otherwise ,what the value of upstream.name should be when service.host
is bar.com ,and how does the service matchs with its upstream? Looking forward to your reply,thanks.
@hnotyet the [upstream]
could be any internal upstream-entity-name. But since it is an internal name, it should never be used as a Host header going upstream. That's why I put it in there as [upstream]
to indicate the name itself is irrelevant in this discussion
Great points raised here all around, and if the behavior was the opposite the discussion could easily go the other way around ("bug: preserve_host is not respected when setting host_header").
The points of agreement are that nobody seems to be a big fan of preserve_host
or mixing route matching and request manipulation in the same entity, but here's where we are.
As of Kong 2.x we must maintain the current behavior.
For Kong 3.x, if both upstream.host_header
and route.preserve_host
continue to exist, I also think we should keep the current behavior because it is the more flexible one. Rationale (uppercase means Kong entity): given that healthchecks are implemented in Kong an Upstream-level concern (i.e. they don't mean "is this Service alive" but rather "is this Upstream alive") and one Upstream can serve different Services matched by different hostnames, it does make sense that different Services may want to use different hostnames and the Upstream may end up using a different one for its healthchecks. (If the counterargument for this is "people should not be doing this", then this hits the more fundamental question of how opinionated should Kong be, and generally so far the answer has been "not much".)
I think @Tieske's table would be a useful addition to the docs (perhaps changing [upstream]
to something like internal-upstream
to avoid confusion as seen in @hnotyet 's question) and the Admin API docs for upstream.host_header
could be changed from
The hostname to be used as Host header when proxying requests through Kong.
to
The hostname to be used as Host header when performing active healthchecks and proxying requests through Kong, unless overridden during proxying using
route.preserve_host
.
This opens another question, though: what is the current behavior for passive healthchecks? Where does it get the host information from, and, does it matter? From a quick look at the sources, I couldn't grasp it.
I am a user,i think:
separate responsibilities for each object,please think about it.
@hishamhm @bungle @hbagdi @Tieske @locao @hnotyet
This opens another question, though: what is the current behavior for passive healthchecks? Where does it get the host information from, and, does it matter? From a quick look at the sources, I couldn't grasp it.
This question is irrelevant, there is no 'host' with passive healthchecks, because there is no probe-request. Stated otherwise, it will inspect the request and hence will always have the Host from that request, that is by definition 1-on-1. This problem only exists for active healthchecks, because they run without having an actual request, hence the inputs in the first 3 columns of the table are unavailable when constructing the probe request.
ALSO: the table is what would make sense to me, I do not think it matches the current behaviour though. If we consider it a bug we can change it, if not we should not change it. (router path behaviour versioning anyone?)
Was this issue ever resolved? I have konghq.com/preserve_host: "true" in my ingress, but when I look at the logs for the kong ingress I do not see the host being preserved and instead I see the Pod IP. And I am getting timeout errors because of it:
upstream timed out (110: Operation timed out) while connecting to upstream, client: 172.16.220.64, server: kong, request: "GET / HTTP/2.0", upstream: "https://172.16.127.167:8443/" , host: "nalk8sou.railcarmgt.com:31725" 172.16.220.64 - - [05/Mar/2021:16:33:05 +0000] "GET / HTTP/2.0" 504 51 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"
From my point of view is not really a bug, it is a consequence of setting "preserve_host" (you can preserve host, or configure a dedicated one. Both is simply not possible logically)
https_sni (which is currently not possible to be configured for upstream) Is from my point of view an additional missing thing. If set, then it needs to be used as SNI value for the TLS handshake. If not set, then the current logic with host_header is sufficient.
https_sni simply tells during the handshake which server certificate should be used for the TLS handshake if the destination host has more then 1 certificate configured on the same https endpoint (And it is also used to do the matching of the ingress to be used for communication ).
http_header in theorie could be used. But it is not really a clean possibility and could break in future and is a little bit misleading. ... for example if someone tries to mitigate the mentioned bug, which is not really a bug.
If you have a tls handshake then the header "host" is fully ignored on server side of the handshake. Only the value of the sni is used in encrypted scenario for identify the destination / routing / ingress object. In unencrypted scenario the value of the "host" header is used to identify destination / routing / ingress object (if there is a ingress controller)
If we would introduce a https_sni, then the bug of "preserve_host" could be solved in a way, that then host_header setting will be not used, and https_sni needs to be taken. So the header "host" could still contain the original requested hostname --> what exactly means "preserve_host"
I think, this bug https://github.com/Kong/kong/issues/5580 is only existing due to the absence of the possibility to specify https_sni in a dedicated way for the upstream
@hbagdi @bungle HI. Do we have any updates?
This issue is marked as stale because it has been open for 14 days with no activity.
Dear contributor,
We are automatically closing this issue because it has not seen any activity for three weeks. We're sorry that your issue could not be resolved. If any new information comes up that could help resolving it, please feel free to reopen it.
Your contribution is greatly appreciated!
Please have a look our pledge to the community for more information.
Sincerely, Your Kong Gateway team
Summary
When
preserve_host
in the Route entity is set totrue
,host_header
in Upstream entity is not respected.Steps To Reproduce
Configure Kong with:
And then fire:
As seen in the response from upstream, the upstream sees the host as
bar.com
and notfoo.com
.Additional Details & Logs
$ kong version
) 2.0 (won't work in previous versions either)$ kong start --vv
)<KONG_PREFIX>/logs/error.log
)