plugin/proxy: loop detection/avoidance?

codepainters commented 6 years ago

When switching from Kube-DNS to CoreDNS on Kubernetes (bare metal), I've observed CoreDNS crashing on any lookup for a non-cluster name.

The root cause turned to be /etc/resolv.conf on Ubuntu pointing to 127.0.0.53 by default (systemd-resolved), kubeadm setup uses this file by default for proxy plugin, leading to CoreDNS (running inside a container) looping to itself recursively, exhausting allowed RAM and crashing.

While there are many ways to fix the root cause (tweaking Corefile, or kubelet --resolv-conf param), I'm wondering if it could be possible for CoreDNS to handle such a misconfiguration gracefully.

One option would be a "preflight check" in a manner similar to dnsmasq --dns-loop-detect (basically sending a TXT test query), or perhaps simply refusing to proxy to IP/port pair that is bound to by CoreDNS (with 0.0.0.0 being a special case)?

miekg commented 6 years ago

Hah, I didn't know about the dnsmasq feature, that sounds useful. It send a special crafted TXT and sees if that comes back? That could probably be implemented a new plugin loop?

fturib commented 6 years ago

@chrisohaver, @rajansandeep

codepainters commented 6 years ago

Exactly, here's an excerpt from dnsmasq manual:

--dns-loop-detect Enable code to detect DNS forwarding loops; ie the situation where a query sent to one of the upstream server eventually returns as a new query to the dnsmasq instance. The process works by generating TXT queries of the form .test and sending them to each upstream server. The hex is a UID which encodes the instance of dnsmasq sending the query and the upstream server to which it was sent. If the query returns to the server which sent it, then the upstream server through which it was sent is disabled and this event is logged. Each time the set of upstream servers changes, the test is re-run on all of them, including ones which were previously disabled.

I assumed such a functionality would go into proxy plugin itself, as it needs to know upstream servers proxy is configured with, anyway? Perhaps I'm not familiar enough with CoreDNS architecture yet (frankly, a few days ago I didn't even know it existed :) ).

johnbelamaric commented 6 years ago

What we have discussed in the past is adding a special EDNS0 option to be able to identify queries that come back to us eventually.

chrisohaver commented 6 years ago

The EDNS0 solution solves a more general case. The dnsmasq method, if I understand it, would not detect zone specific loops (e.g. there is a forwarding loop for one zone, but not another).

The EDNS0 solution would be less complex to implement than the dnsmasq method since it's something done inline with every forwarded request - not a separate process that has to monitor for changes to upstreams. But since it is in line, it does add some work for every forwarded packet.

miekg commented 6 years ago

[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]

The EDNS0 solution solves a more general case.

but edns0 is hop-by-hop, so it can it will be dropped by other nameservers; i.e. you can only detect direct coredns -> coredns loops.

The dnsmasq method, if I understand it, would not detect zone specific loops (e.g. there is a forwarding loop for one zone, but not another).

true.

The EDNS0 solution would be less complex to implement than the dnsmasq method since it's something done inline with every forwarded request - not a separate process that has to monitor for changes to upstreams. But since it is in line, it does add some work for every forwarded packet.

skydns has code that does this, but only if is was forwarding to a subdomain.

johnbelamaric commented 6 years ago

Huh. Didn't realize EDNS0 options wouldn't be forwarded on. Seems like things like ECS would want them forwarded on.

chrisohaver commented 6 years ago

I may misunderstand the context... but ...

RFC 6891 - 6.2.6 Support in Middleboxes
    ...
   Middleboxes that simply forward requests to a recursive resolver MUST
   NOT modify and MUST NOT delete the OPT record contents in either
   direction.

miekg commented 6 years ago

[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]

I may misunderstand the context... but ...

RFC 6891 - 6.2.6 Support in Middleboxes
   ...
  Middleboxes that simply forward requests to a recursive resolver MUST
  NOT modify and MUST NOT delete the OPT record contents in either
  direction.

Ah, thanks. I stand corrected :-) Do note the "... to a recursive resolver", as in: the resolver is free to drop those records.

johnbelamaric commented 6 years ago

Yeah, that definitely limits the effectiveness of this method (if the recursive resolver drops it).

chrisohaver commented 6 years ago

Seems like things like ECS would want them forwarded on.

Looks like that is written into the ECS RFC (https://tools.ietf.org/html/rfc7871#section-7.5), and not part of the ENDNS(0) RFC.

After reading more context, I see that the EDNS(0) RFC is pretty clear that it is only hop-by-hop. And the "middlebox" section is only referring to firewalls and other "non-dns" network gear, which may parse and act on it (e.g. block it), but if it does forward, it must forward it intact.

miekg commented 6 years ago

In a distributed setup you also want to know if any of your instances see the same query. Only way to solve that is to have some external entity where the loop query is stored and checked against.

chrisohaver commented 6 years ago

Only way to solve that is to have some external entity where the loop query is stored and checked against.

With out an external entity, for a small number of replicas the original instance would receive the test query eventually by chance, after several loops... assuming the replica doesn't die before receiving it. For a very large number of replicas, it could be quite a few loops before it gets lucky.

If all replicas send the same UID, then it would work better - but that requires sharing the UID across replicas.

miekg commented 6 years ago

[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]

Only way to solve that is to have some external entity where the loop query is stored and checked against.

With out an external entity, for a small number of replicas the original instance would receive the test query eventually by chance, after several loops... assuming the replica doesn't die before receiving it. For a very large number of replicas, it could be quite a few loops before it gets lucky.

ha, lol, yes you're right.

johnbelamaric commented 6 years ago

It's easy enough to share a UUID (or other identifier) across replicas, it can just be in the Corefile - in K8s they all have the same Corefile.

johnbelamaric commented 6 years ago

I guess that doesn't makes sense. Lost the context. We're talking about a UUID for the specific query. Ignore that last comment...

johnbelamaric commented 6 years ago

Actually, I take that back. All we need to know is that "we" have seen this query before - if the EDNS0 option is there with the identifier, then some instance has seen it, we bump the loop count and pass it on (fail if loop count too high).

chrisohaver commented 6 years ago

We're talking about a UUID for the specific query.

I suppose it needs to be a unique id for the proxy being tested for loop back. So it could be a hash of the targeted proxy ip, and a shared uid.

chrisohaver commented 6 years ago

if the EDNS0 option is there

But edns0 is not guaranteed to be forwarded back to us.

miekg commented 6 years ago

[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]

Actually, I take that back. All we need to know is that "we" have seen this query before - if the EDNS0 option is there with the identifier, then some instance has seen it, we bump the loop count and pass it on (fail if loop count too high).

a shared "secret" can be used. I wonder what action we should take though, log this? I mean, such a query is easily guess/read from the wire/etc. If we log.Fatal we've implemented a query of death.

johnbelamaric commented 6 years ago

We log it and return SERVFAIL. Are you worried about spamming the logs?

On Apr 30, 2018, at 3:05 PM, Miek Gieben notifications@github.com<mailto:notifications@github.com> wrote:

[ Quoting notifications@github.com<mailto:notifications@github.com> in "Re: [coredns/coredns] plugin/proxy:..." ]

Actually, I take that back. All we need to know is that "we" have seen this query before - if the EDNS0 option is there with the identifier, then some instance has seen it, we bump the loop count and pass it on (fail if loop count too high).

a shared "secret" can be used. I wonder what action we should take though, log this? I mean, such a query is easily guess/read from the wire/etc. If we log.Fatal we've implemented a query of death.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/coredns/coredns/issues/1647#issuecomment-385497530, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJB4s-OR7kIOiOPVyLQhp3xbztnZ47Inks5tt2CUgaJpZM4TCnfS.

miekg commented 6 years ago

[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]

We log it and return SERVFAIL. Are you worried about spamming the logs?

sgtm - shouldn't care about the logging now I think.

chrisohaver commented 6 years ago

If we log.Fatal we've implemented a query of death.

We log it and return SERVFAIL.

The loop probes happen asynchronously to queries. So an upstream with a detected loop should be treated as if it was unhealthy.

johnbelamaric commented 6 years ago

We're doing loop probes? I thought we were just sticking the EDNS0 option on every query. Did I lose track of the plan somewhere along the line?

chrisohaver commented 6 years ago

As Miek pointed out, edns0 is hop-by-hop, so it can/will be dropped by other nameservers; i.e. you can only detect direct coredns -> coredns loops.

johnbelamaric commented 6 years ago

ok, but how do we decide what queries to send out to test for loops? It would seem to me that it is completely dependent on the zone we are querying, whether or not there will be a loop?

miekg commented 6 years ago

[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]

ok, but how do we decide what queries to send out to test for loops? It would seem to me that it is completely dependent on the zone we are querying, whether or not there will be a loop?

I dunno, what does dnsmasq do?

chrisohaver commented 6 years ago

what does dnsmasq do?

From the option description...

Enable code to detect DNS forwarding loops; ie the situation where a query sent to one of the upstream server eventually returns as a new query to the dnsmasq instance. The process works by generating TXT queries of the form .test and sending them to each upstream server. The hex is a UID which encodes the instance of dnsmasq sending the query and the upstream server to which it was sent. If the query returns to the server which sent it, then the upstream server through which it was sent is disabled and this event is logged. Each time the set of upstream servers changes, the test is re-run on all of them, including ones which were previously disabled.

chrisohaver commented 6 years ago

The dnsmasq method, if I understand it, would not detect zone specific loops (e.g. there is a forwarding loop for one zone, but not another).

It can fail to detect loops when stuff gets fancy upstream... IOW: It appears that it is only intended to detect loops in a chain of upstreams that themselves forward everything they are not authoritative for upstream to a single server (eventually back to us).

Not water tight by any means.

Perhaps I'm misunderstanding their description of the feature, or taking it too literally.

johnbelamaric commented 6 years ago

So maybe then we need belt and suspenders. Each method detects different ways of looping, and each method has its weaknesses.

johnbelamaric commented 6 years ago

@codepainters what version of k8s did you do this on? If you used 1.9, then kubeadm would not convert any kube-dns configmap settings - such as the upstream nameservers. Did you have that set in kube-dns?

codepainters commented 6 years ago

@johnbelamaric if I remember correctly, that was k8s v1.10.0. In fact, I did a clean, fresh installation in my Vagrant-based test env back then, so no configmap conversion was involved. It just failed due to CoreDNS being configured with /etc/resolv.conf as a source of upstreams by default + resolve.conf specifying nameserver 127.0.0.53 on a default Ubuntu installation.

chrisohaver commented 6 years ago

To solve this specific problem, we could default proxy/forward to not forward to any IPs coredns is listening on when reading the /etc/resolv.conf. I think this is a sane default. It could be overridden with an option. If it's hard to know what coredns is listening on when reading the /etc/resolv.conf, we could exclude all of 127.0.0.0/8 (actually ... we should exclude all of 127.0.0.0/8 in addition to any IPs we bind to, due to the way linux routes all 127.0.0.0/8 IPs to lo).

dnsmasq may be doing this.

chrisohaver commented 6 years ago

Since /etc/resolv.conf only contained 127.0.0.53, then where did kube-dns/dnsmasq get it's upstream nameservers from? It must have been via a kube-dns configmap, or via a dnsmasq command flags either directing it to another resolv.conf file, or direct forwarding with e.g. --server=1.2.3.4 and --no-resolv.

@codepainters, Was upstream resolution working with kube-dns - i don't see how it could have without customization?

FYI: This same problem also exists for kube-dns: https://github.com/kubernetes/kubeadm/issues/273

codepainters commented 6 years ago

@chrisohaver apologies for any confusion, indeed I've seen this problem with kube-dns, too (long ago, solved via explicit config map tweaks).

IMHO, among the options discussed, using EDNS0 to detect direct-to-self loops seems useful to me, and it doesn't sound overly complicated (as far as I understand the scope). Simply banning 127.0.0.0/8 would also work in my particular case.

chrisohaver commented 6 years ago

Thanks for clearing that up. The ends0 option would add overhead on every incoming packet. Banning 127.0.0.0/8 by default, and allowing user to override would not.

Especially in kubernetes, 127.0.0.0/8 points to a different interface in the host than in the context of the cluster dns pod (i.e. the host's loopback interface and the cluster dns pod's loopback interface are separate interfaces). So in the Kubernetes context, inheriting a local address form the host's config is something you would never want to do.

I have a tendency to think kubernetecentrically though. Changing default forward/proxy behavior could be a breaking change for someone not using k8s, who is using CoreDNS to forward to another local dns server listening on a loopback address. Alternately we could leave the default behavior unchanged, add an option to forward/proxy that excludes loopbacks from being imported from /etc/resolv.conf files. We'd need to update the kubeadm upgrade kube-dns configmap translation, and the default coredns manifests if we do it that way.

luxas commented 6 years ago

cc @timothysc @luxas @detiber

miekg commented 6 years ago

[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]

Thanks for clearing that up. The ends0 option would add overhead on every incoming packet. Banning 127.0.0.0/8 by default, and allowing user to override would not.

I'm not 100% a fan of making some ranges special even if it is localhost...

luxas commented 6 years ago

Hi, do we know how to execute or resolve this at this moment? Or do we need k8s installers to be aware of this and set different params for different kubelets? It's super annoying from an installer's perspective to set different kubelet args on different instances based on what OS they're running in order for the DNS to work.

Hope we can find a solution for this ASAP given that CoreDNS is planned to go GA in v1.11.

Thanks!!

miekg commented 6 years ago

[ Quoting notifications@github.com in "Re: [coredns/coredns] plugin/proxy:..." ]

Hi, do we know how to execute or resolve this at this moment? Or do we need k8s installers to be aware of this and set different params for different kubelets? It's super annoying from an installer's perspective to set different kubelet args on different instances based on what OS they're running in order for the DNS to work.

Hope we can find a solution for this ASAP given that CoreDNS is planned to go GA in v1.11.

What would be the result if coredns detects a loop? It's already too late in the process (probably?) of the install and DNS resolution won't work anyway?

Or is there more to this in kubedns/dnsmasq that we've missed?

johnbelamaric commented 6 years ago

This happens with kubedns/dnsmasq too: https://github.com/coredns/coredns/issues/1647#issuecomment-386008886

chrisohaver commented 6 years ago

I'll have time next week to implement a fix in proxy and forward for this specific issue: Where /etc/resolv.conf includes a server in 127.0.0.0/8 causing a loop. I plan on adding an option to exclude CIDRs from the imported file. E.g. The following would forward everything to the upstream resolvers in /etc/resolv.conf that do not have IP's that fall into 127.0.0.0/8.

    proxy . /etc/resolv.conf {
        exclude 127.0.0.0/8
    }

exclude may be too close to except, a valid proxy option that means something else. I'm open to other name suggestions.

johnbelamaric commented 6 years ago

I think it needs to be clear its referring to excluding resolvers. Something like ignore_resolvers?

chrisohaver commented 6 years ago

There is a PR open to fix this properly in the kubeadm install. kubernetes/kubernetes#63632 That fix makes mitigating the loop in CoreDNS this less critical.

chrisohaver commented 6 years ago

I think it needs to be clear its referring to excluding resolvers. Something like ignore_resolvers?

I agree, but do we have/allow underscores in the directives? ... or mixed case? The limitation has always been kind of a PITA for coming up with readable directives.

chrisohaver commented 6 years ago

There is a PR open to fix this properly in the kubeadm install. kubernetes/kubernetes#63632 That fix makes mitigating the loop in CoreDNS this less critical.

Clarification... the aforementioned PR fixes this issue in kubelet config in kubeadm installs. It tells kubelet to use /run/systemd/resolve/resolv.conf, instead of /etc/resolv.conf. I'm not sure if that means that all Default DNS policy pods would get the correct resolv.conf (and place it in /etc/resolv.conf in those pods). I think yes, but not sure. If so, then it that PR will fix the issue for CoreDNS and kube-dns in kubeadm installs.

chrisohaver commented 6 years ago

I'm not sure if that means that all Default DNS policy pods would get the correct resolv.conf (and place it in /etc/resolv.conf in those pods). I think yes, but not sure.

According to kubernetes/kubernetes#45828, yes.

luxas commented 6 years ago

We really would prefer to not go with https://github.com/kubernetes/kubernetes/issues/45828 though. That's just the last resort in case nothing else works. If we can figure out some generic, other way, it'd be great!

chrisohaver commented 6 years ago

Well, coredns can avoid looping on itself, but it can’t forward to the right upstreams because the pod is set up with the wrong /etc/resolv.conf...

In other words, we can make coredns not crash, and warn the user that the upstreams wont work, but it can’t fix the problem itself.

luxas commented 6 years ago

Ok, thanks for the explanation @chrisohaver!

coredns / coredns

plugin/proxy: loop detection/avoidance? #1647