Closed codepainters closed 6 years ago
Hah, I didn't know about the dnsmasq feature, that sounds useful. It send a special crafted TXT and sees if that comes back? That could probably be implemented a new plugin loop?
@chrisohaver, @rajansandeep
Exactly, here's an excerpt from dnsmasq
manual:
--dns-loop-detect Enable code to detect DNS forwarding loops; ie the situation where a query sent to one of the upstream server eventually returns as a new query to the dnsmasq instance. The process works by generating TXT queries of the form
.test and sending them to each upstream server. The hex is a UID which encodes the instance of dnsmasq sending the query and the upstream server to which it was sent. If the query returns to the server which sent it, then the upstream server through which it was sent is disabled and this event is logged. Each time the set of upstream servers changes, the test is re-run on all of them, including ones which were previously disabled.
I assumed such a functionality would go into proxy
plugin itself, as it needs to know upstream servers proxy
is configured with, anyway? Perhaps I'm not familiar enough with CoreDNS architecture yet (frankly, a few days ago I didn't even know it existed :) ).
What we have discussed in the past is adding a special EDNS0 option to be able to identify queries that come back to us eventually.
The EDNS0 solution solves a more general case. The dnsmasq method, if I understand it, would not detect zone specific loops (e.g. there is a forwarding loop for one zone, but not another).
The EDNS0 solution would be less complex to implement than the dnsmasq method since it's something done inline with every forwarded request - not a separate process that has to monitor for changes to upstreams. But since it is in line, it does add some work for every forwarded packet.
[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]
The EDNS0 solution solves a more general case.
but edns0 is hop-by-hop, so it can it will be dropped by other nameservers; i.e. you can only detect direct coredns -> coredns loops.
The dnsmasq method, if I understand it, would not detect zone specific loops (e.g. there is a forwarding loop for one zone, but not another).
true.
The EDNS0 solution would be less complex to implement than the dnsmasq method since it's something done inline with every forwarded request - not a separate process that has to monitor for changes to upstreams. But since it is in line, it does add some work for every forwarded packet.
skydns has code that does this, but only if is was forwarding to a subdomain.
Huh. Didn't realize EDNS0 options wouldn't be forwarded on. Seems like things like ECS would want them forwarded on.
I may misunderstand the context... but ...
RFC 6891 - 6.2.6 Support in Middleboxes
...
Middleboxes that simply forward requests to a recursive resolver MUST
NOT modify and MUST NOT delete the OPT record contents in either
direction.
[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]
I may misunderstand the context... but ...
RFC 6891 - 6.2.6 Support in Middleboxes ... Middleboxes that simply forward requests to a recursive resolver MUST NOT modify and MUST NOT delete the OPT record contents in either direction.
Ah, thanks. I stand corrected :-) Do note the "... to a recursive resolver", as in: the resolver is free to drop those records.
Yeah, that definitely limits the effectiveness of this method (if the recursive resolver drops it).
Seems like things like ECS would want them forwarded on.
Looks like that is written into the ECS RFC (https://tools.ietf.org/html/rfc7871#section-7.5), and not part of the ENDNS(0) RFC.
After reading more context, I see that the EDNS(0) RFC is pretty clear that it is only hop-by-hop. And the "middlebox" section is only referring to firewalls and other "non-dns" network gear, which may parse and act on it (e.g. block it), but if it does forward, it must forward it intact.
In a distributed setup you also want to know if any of your instances see the same query. Only way to solve that is to have some external entity where the loop query is stored and checked against.
Only way to solve that is to have some external entity where the loop query is stored and checked against.
With out an external entity, for a small number of replicas the original instance would receive the test query eventually by chance, after several loops... assuming the replica doesn't die before receiving it. For a very large number of replicas, it could be quite a few loops before it gets lucky.
If all replicas send the same UID, then it would work better - but that requires sharing the UID across replicas.
[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]
Only way to solve that is to have some external entity where the loop query is stored and checked against.
With out an external entity, for a small number of replicas the original instance would receive the test query eventually by chance, after several loops... assuming the replica doesn't die before receiving it. For a very large number of replicas, it could be quite a few loops before it gets lucky.
ha, lol, yes you're right.
It's easy enough to share a UUID (or other identifier) across replicas, it can just be in the Corefile - in K8s they all have the same Corefile.
I guess that doesn't makes sense. Lost the context. We're talking about a UUID for the specific query. Ignore that last comment...
Actually, I take that back. All we need to know is that "we" have seen this query before - if the EDNS0 option is there with the identifier, then some instance has seen it, we bump the loop count and pass it on (fail if loop count too high).
We're talking about a UUID for the specific query.
I suppose it needs to be a unique id for the proxy being tested for loop back. So it could be a hash of the targeted proxy ip, and a shared uid.
if the EDNS0 option is there
But edns0 is not guaranteed to be forwarded back to us.
[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]
Actually, I take that back. All we need to know is that "we" have seen this query before - if the EDNS0 option is there with the identifier, then some instance has seen it, we bump the loop count and pass it on (fail if loop count too high).
a shared "secret" can be used. I wonder what action we should take though, log this? I mean, such a query is easily guess/read from the wire/etc. If we log.Fatal we've implemented a query of death.
We log it and return SERVFAIL. Are you worried about spamming the logs?
On Apr 30, 2018, at 3:05 PM, Miek Gieben notifications@github.com<mailto:notifications@github.com> wrote:
[ Quoting notifications@github.com<mailto:notifications@github.com> in "Re: [coredns/coredns] plugin/proxy:..." ]
Actually, I take that back. All we need to know is that "we" have seen this query before - if the EDNS0 option is there with the identifier, then some instance has seen it, we bump the loop count and pass it on (fail if loop count too high).
a shared "secret" can be used. I wonder what action we should take though, log this? I mean, such a query is easily guess/read from the wire/etc. If we log.Fatal we've implemented a query of death.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/coredns/coredns/issues/1647#issuecomment-385497530, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJB4s-OR7kIOiOPVyLQhp3xbztnZ47Inks5tt2CUgaJpZM4TCnfS.
[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]
We log it and return SERVFAIL. Are you worried about spamming the logs?
sgtm - shouldn't care about the logging now I think.
If we log.Fatal we've implemented a query of death.
We log it and return SERVFAIL.
The loop probes happen asynchronously to queries. So an upstream with a detected loop should be treated as if it was unhealthy.
We're doing loop probes? I thought we were just sticking the EDNS0 option on every query. Did I lose track of the plan somewhere along the line?
As Miek pointed out, edns0 is hop-by-hop, so it can/will be dropped by other nameservers; i.e. you can only detect direct coredns -> coredns loops.
ok, but how do we decide what queries to send out to test for loops? It would seem to me that it is completely dependent on the zone we are querying, whether or not there will be a loop?
[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]
ok, but how do we decide what queries to send out to test for loops? It would seem to me that it is completely dependent on the zone we are querying, whether or not there will be a loop?
I dunno, what does dnsmasq do?
what does dnsmasq do?
From the option description...
Enable code to detect DNS forwarding loops; ie the situation where a query sent to one of the upstream server eventually returns as a new query to the dnsmasq instance. The process works by generating TXT queries of the form .test and sending them to each upstream server. The hex is a UID which encodes the instance of dnsmasq sending the query and the upstream server to which it was sent. If the query returns to the server which sent it, then the upstream server through which it was sent is disabled and this event is logged. Each time the set of upstream servers changes, the test is re-run on all of them, including ones which were previously disabled.
The dnsmasq method, if I understand it, would not detect zone specific loops (e.g. there is a forwarding loop for one zone, but not another).
It can fail to detect loops when stuff gets fancy upstream... IOW: It appears that it is only intended to detect loops in a chain of upstreams that themselves forward everything they are not authoritative for upstream to a single server (eventually back to us).
Not water tight by any means.
Perhaps I'm misunderstanding their description of the feature, or taking it too literally.
So maybe then we need belt and suspenders. Each method detects different ways of looping, and each method has its weaknesses.
@codepainters what version of k8s did you do this on? If you used 1.9, then kubeadm
would not convert any kube-dns
configmap settings - such as the upstream nameservers. Did you have that set in kube-dns?
@johnbelamaric if I remember correctly, that was k8s v1.10.0. In fact, I did a clean, fresh installation in my Vagrant-based test env back then, so no configmap conversion was involved. It just failed due to CoreDNS being configured with /etc/resolv.conf
as a source of upstreams by default + resolve.conf
specifying nameserver 127.0.0.53
on a default Ubuntu installation.
To solve this specific problem, we could default proxy/forward to not forward to any IPs coredns is listening on when reading the /etc/resolv.conf. I think this is a sane default. It could be overridden with an option. If it's hard to know what coredns is listening on when reading the /etc/resolv.conf, we could exclude all of 127.0.0.0/8 (actually ... we should exclude all of 127.0.0.0/8 in addition to any IPs we bind to, due to the way linux routes all 127.0.0.0/8 IPs to lo).
dnsmasq may be doing this.
Since /etc/resolv.conf
only contained 127.0.0.53
, then where did kube-dns/dnsmasq get it's upstream nameservers from? It must have been via a kube-dns configmap, or via a dnsmasq command flags either directing it to another resolv.conf file, or direct forwarding with e.g. --server=1.2.3.4
and --no-resolv
.
@codepainters, Was upstream resolution working with kube-dns - i don't see how it could have without customization?
FYI: This same problem also exists for kube-dns: https://github.com/kubernetes/kubeadm/issues/273
@chrisohaver apologies for any confusion, indeed I've seen this problem with kube-dns
, too (long ago, solved via explicit config map tweaks).
IMHO, among the options discussed, using EDNS0
to detect direct-to-self loops seems useful to me, and it doesn't sound overly complicated (as far as I understand the scope). Simply banning 127.0.0.0/8
would also work in my particular case.
Thanks for clearing that up. The ends0 option would add overhead on every incoming packet.
Banning 127.0.0.0/8
by default, and allowing user to override would not.
Especially in kubernetes, 127.0.0.0/8 points to a different interface in the host than in the context of the cluster dns pod (i.e. the host's loopback interface and the cluster dns pod's loopback interface are separate interfaces). So in the Kubernetes context, inheriting a local address form the host's config is something you would never want to do.
I have a tendency to think kubernetecentrically though. Changing default forward/proxy behavior could be a breaking change for someone not using k8s, who is using CoreDNS to forward to another local dns server listening on a loopback address. Alternately we could leave the default behavior unchanged, add an option to forward/proxy that excludes loopbacks from being imported from /etc/resolv.conf files. We'd need to update the kubeadm upgrade kube-dns configmap translation, and the default coredns manifests if we do it that way.
cc @timothysc @luxas @detiber
[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]
Thanks for clearing that up. The ends0 option would add overhead on every incoming packet. Banning
127.0.0.0/8
by default, and allowing user to override would not.
I'm not 100% a fan of making some ranges special even if it is localhost...
Hi, do we know how to execute or resolve this at this moment? Or do we need k8s installers to be aware of this and set different params for different kubelets? It's super annoying from an installer's perspective to set different kubelet args on different instances based on what OS they're running in order for the DNS to work.
Hope we can find a solution for this ASAP given that CoreDNS is planned to go GA in v1.11.
Thanks!!
[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]
Hi, do we know how to execute or resolve this at this moment? Or do we need k8s installers to be aware of this and set different params for different kubelets? It's super annoying from an installer's perspective to set different kubelet args on different instances based on what OS they're running in order for the DNS to work.
Hope we can find a solution for this ASAP given that CoreDNS is planned to go GA in v1.11.
What would be the result if coredns detects a loop? It's already too late in the process (probably?) of the install and DNS resolution won't work anyway?
Or is there more to this in kubedns/dnsmasq that we've missed?
This happens with kubedns/dnsmasq too: https://github.com/coredns/coredns/issues/1647#issuecomment-386008886
I'll have time next week to implement a fix in proxy
and forward
for this specific issue:
Where /etc/resolv.conf
includes a server in 127.0.0.0/8 causing a loop. I plan on adding an option to exclude CIDRs from the imported file. E.g. The following would forward everything to the upstream resolvers in /etc/resolv.conf
that do not have IP's that fall into 127.0.0.0/8.
proxy . /etc/resolv.conf {
exclude 127.0.0.0/8
}
exclude
may be too close to except
, a valid proxy
option that means something else. I'm open to other name suggestions.
I think it needs to be clear its referring to excluding resolvers. Something like ignore_resolvers
?
There is a PR open to fix this properly in the kubeadm install. kubernetes/kubernetes#63632 That fix makes mitigating the loop in CoreDNS this less critical.
I think it needs to be clear its referring to excluding resolvers. Something like ignore_resolvers?
I agree, but do we have/allow underscores in the directives? ... or mixed case? The limitation has always been kind of a PITA for coming up with readable directives.
There is a PR open to fix this properly in the kubeadm install. kubernetes/kubernetes#63632 That fix makes mitigating the loop in CoreDNS this less critical.
Clarification... the aforementioned PR fixes this issue in kubelet config in kubeadm installs. It tells kubelet to use /run/systemd/resolve/resolv.conf, instead of /etc/resolv.conf. I'm not sure if that means that all Default DNS policy pods would get the correct resolv.conf (and place it in /etc/resolv.conf in those pods). I think yes, but not sure. If so, then it that PR will fix the issue for CoreDNS and kube-dns in kubeadm installs.
I'm not sure if that means that all Default DNS policy pods would get the correct resolv.conf (and place it in /etc/resolv.conf in those pods). I think yes, but not sure.
According to kubernetes/kubernetes#45828, yes.
We really would prefer to not go with https://github.com/kubernetes/kubernetes/issues/45828 though. That's just the last resort in case nothing else works. If we can figure out some generic, other way, it'd be great!
Well, coredns can avoid looping on itself, but it can’t forward to the right upstreams because the pod is set up with the wrong /etc/resolv.conf...
In other words, we can make coredns not crash, and warn the user that the upstreams wont work, but it can’t fix the problem itself.
Ok, thanks for the explanation @chrisohaver!
When switching from Kube-DNS to CoreDNS on Kubernetes (bare metal), I've observed CoreDNS crashing on any lookup for a non-cluster name.
The root cause turned to be
/etc/resolv.conf
on Ubuntu pointing to 127.0.0.53 by default (systemd-resolved
),kubeadm
setup uses this file by default forproxy
plugin, leading to CoreDNS (running inside a container) looping to itself recursively, exhausting allowed RAM and crashing.While there are many ways to fix the root cause (tweaking
Corefile
, orkubelet --resolv-conf
param), I'm wondering if it could be possible for CoreDNS to handle such a misconfiguration gracefully.One option would be a "preflight check" in a manner similar to
dnsmasq --dns-loop-detect
(basically sending aTXT
test query), or perhaps simply refusing to proxy to IP/port pair that is bound to by CoreDNS (with 0.0.0.0 being a special case)?