Open akorp opened 1 year ago
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Hi, we have the same problem. This has even caused our DNS server to stop responding, due to too many (unnecessary) DNS requests.
Having same issue
@aanandr, @phealy would you be able to assist?
Author: | akorp |
---|---|
Assignees: | - |
Labels: | `bug`, `networking/azcni`, `action-required`, `Needs Attention :wave:` |
Milestone: | - |
Action required from @Azure/aks-pm
Initial take more to come.
So AKS is inheriting reddog.microsoft.com from the vm networking team. We're still trying to find why they don;'t use a blank serhc suffix. If there are reasons abut they don't matter for pods we might have coredns drop reddog.microsoft.com queries so they don't get forwarded to your upstram dns. Tryng to figure out if a fallthough or except config in coredns-custom would help people till we know more.
We're seeing millions of failed DNS requests because of this issue. Is there any update?
According to this page we can customize the CoreDNS config used by AKS. We just have to create and apply a custom ConfigMap. And we have all of the built-in CoreDNS plugins at our disposal (which includes the ACL plugin).
So, would a ConfigMap like this be possible? (I haven't tested this yet)
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns-custom
namespace: kube-system
data:
reddog.server: |
reddog.microsoft.com:53 {
acl {
drop
}
}
We have the same issue. @paulgmiller what about the vm networking team. Did they react to this issue? Is there a solution published in the meantime. I think a custom coredns configuration is a bit overengineered.
Thanks Sebastian
According to this page we can customize the CoreDNS config used by AKS. We just have to create and apply a custom ConfigMap. And we have all of the built-in CoreDNS plugins at our disposal (which includes the ACL plugin).
So, would a ConfigMap like this be possible? (I haven't tested this yet)
apiVersion: v1 kind: ConfigMap metadata: name: coredns-custom namespace: kube-system data: reddog.server: | reddog.microsoft.com:53 { acl { drop } }
Has anyone actually tried this in AKS?
drop
is introduced in coredns-1.10.1
https://coredns.io/2023/01/20/coredns-1.10.1-release/
As of today, the latest K8s version available for AKS GA is 1.27, which has Core DNS V1.9.4
https://learn.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli
So AKS doesn't have drop feature in core-dns at the moment.
Also, is drop
a better option or block
a better option? From the docs, it looks like block
will return a REFUSE
response code, which should allow DNS resolver to move onto the next search domain. What would drop
return? and would the DNS resolver move on as usual?
Tested drop
on a local k8s with coredns-1.10.1
and it results in timeout. The commit in https://github.com/coredns/coredns/pull/5722 is all about drop
being added for DoS attacks. It doesn't write any response back causing the client to timeout.
block
on the other end, returns a REFUSED
error which ~seem the right one at this point~ does not let the dns resolver move onto the next in the search domain list or upstream. So in the end, host is not resolved.
Another option is to rewrite it to avoid any reddog.microsoft.com
DNS queries from going upstream
rewrite stop {
name regex (.*)\.reddog.microsoft.com\.$ {1}
answer name (.*)\.$ {1}.reddog.microsoft.com
.....
forward . /etc/resolv.conf
.....
.....
}
or, use template
to return NXDOMAIN
for reddog.microsoft.com
reddog.microsoft.com:53 {
......
......
template ANY ANY {
rcode NXDOMAIN
}
......
......
}
Just curious if there is any updates on this? Having the same issue and need some recommendations if there is a possibility creating a dynamic kubelet config for this to cascade clusterwise? has anyone tried it?
Following
Just curious if there is any updates on this? Having the same issue and need some recommendations if there is a possibility creating a dynamic kubelet config for this to cascade clusterwise? has anyone tried it?
We added the following config in our clusters. Created a coredns-custom
configmap and stopped it from going upstream for reddog.microsoft.com
. It has been more than half a year now and things have been smooth since then, the load on our upstream DNS servers reduced too.
reddog.server: |
reddog.microsoft.com:53 {
errors
template ANY ANY {
rcode NXDOMAIN
}
prometheus :9153
cache 30
}
Just curious if there is any updates on this? Having the same issue and need some recommendations if there is a possibility creating a dynamic kubelet config for this to cascade clusterwise? has anyone tried it?
We added the following config in our clusters. Created a
coredns-custom
configmap and stopped it from going upstream forreddog.microsoft.com
. It has been more than half a year now and things have been smooth since then, the load on our upstream DNS servers reduced too.reddog.server: | reddog.microsoft.com:53 { errors template ANY ANY { rcode NXDOMAIN } prometheus :9153 cache 30 }
Can you please share how exactly you implemented this in coredns-custom and how did you tested and validated? I'm actively working on this and evaluating my options. I have implemented in ingress ngninx controller pod only in its dnsConfig settings. But will it cover the whole cluster workload if I do this in coredns-custom configmap? Appreciate the help!
Just curious if there is any updates on this? Having the same issue and need some recommendations if there is a possibility creating a dynamic kubelet config for this to cascade clusterwise? has anyone tried it?
We added the following config in our clusters. Created a
coredns-custom
configmap and stopped it from going upstream forreddog.microsoft.com
. It has been more than half a year now and things have been smooth since then, the load on our upstream DNS servers reduced too.reddog.server: | reddog.microsoft.com:53 { errors template ANY ANY { rcode NXDOMAIN } prometheus :9153 cache 30 }
@msamad, the change in coredns-custom configmap crashed my ingress nginx along with kuma service mesh in lab cluster. I had to revert the change and everything started working.
Just curious if there is any updates on this? Having the same issue and need some recommendations if there is a possibility creating a dynamic kubelet config for this to cascade clusterwise? has anyone tried it?
We added the following config in our clusters. Created a
coredns-custom
configmap and stopped it from going upstream forreddog.microsoft.com
. It has been more than half a year now and things have been smooth since then, the load on our upstream DNS servers reduced too.reddog.server: | reddog.microsoft.com:53 { errors template ANY ANY { rcode NXDOMAIN } prometheus :9153 cache 30 }
@msamad, the change in coredns-custom configmap crashed my ingress nginx along with kuma service mesh in lab cluster. I had to revert the change and everything started working.
Don't know how your whole cluster is set up so can't comment much. This is the docs page to follow on how to customize coredns https://learn.microsoft.com/en-us/azure/aks/coredns-custom#rewrite-dns
Describe the bug All host in Azure clusters are getting
search reddog.microsoft.com
in theirresolve.conf
. Because of this all pods are also gettingreddog.microsoft.com
in theirresolve.conf
with a defaultcoredns
setup, for example:also on the host with coredns we have
search reddog.miscrosoft.com
According to Microsoft documentation
reddog.microsoft.com
is a non-functional placeholder, which does not have any DNS records (https://learn.microsoft.com/en-us/azure/virtual-network/virtual-networks-name-resolution-for-vms-and-role-instances#vms-and-role-instances). However having this placeholder in pods'search
force all non-cluster DNS requests with fewer thanndots:5
to be first looked withreddog.microsoft.com
suffix (for examplemanagement.azure.com.reddog.microsoft.com.
,api.eu0.signalfx.com.reddog.microsoft.com.
). This leads to a lot of necessary requests (non-cached) and traffic to our DNS servers. This also leads to longer DNS resolutions, since an extra unnecessary DNS resolution requests are made with suffix.reddog.microsoft.com.
before proper DNS requests are made.To Reproduce Intall AKS cluster with custom DNS servers on vnet with a default
coredns
.Expected behavior No
reddog.microsoft.com
request should be made to external DNS servers.Environment (please complete the following information):