RedisLabs / redis-enterprise-k8s-docs

151 stars 89 forks source link

DNS resolution issue in cluster pod #256

Closed kartikb-io closed 1 year ago

kartikb-io commented 1 year ago

Hi, I am following the directions here but with Calico multi-cluster mesh setup. Without going into too much detail at the moment on the implementation of achieving the cluster mesh, I have two clusters with pods able to ping each other between the clusters as well as service CIDRs advertised. Now with changes to CoreDNS ConfigMap what this allows me to do is resolve a service name from cluster A in cluster B. I have verified this works by spawning a simple redis-cli3 pod in cluster A and curling the fqdn of my service in cluster B which succeeds with a 401 (as expected)

However, when I do the same thing from the redis-enterprise-node container in cluster A and try to curl/resolve the fqdn of service in cluster B it is unable to resolve it, and thus the crdb command fails as well when trying to resolve the fqdn.

As a preliminary troubleshooting attempt, the /etc/resolv.conf in both the redis-cli3 pod as well as the redis-enterprise-node container is identical but DNS resolution is somehow behaving differently in the enterprise-node container so I was hoping to understand why this may be the case.

I have intentionally not provided further details as I did not want to overload with info on the first question and would be happy to provide additional details as needed based on what you'd require from me. Thanks in advance.

kartikb-io commented 1 year ago

Figured out that it was the mdns4_minimal module in /etc/nsswitch intercepting any queries with .local as per its spec but the external queries are passed along to the dns module. Since my goal here was to not use external DNS and use internal DNS with cluster mesh, I ended up re-writing my CoreDNS to make all .cluster.local stub to *.demo.me . I suspect with tls and CN there might be issues but I'll tackle that part later. This can be closed.