Closed rmdvb closed 1 month ago
Hello, thanks for reporting this issue. Can you please share your user data settings (or a redacted version if necessary)?
Specifically, I'm curious what your dns and network configurations look like.
One thing to note is that for variants using systemd-networkd (*-k8s-1.28-* and *-ecs-2-* and newer), resolv.conf
exists in the path/run/systemd/resolve/resolv.conf
which can be accessed on the host using sheltie
.
Hi @koooosh, Thanks for your reply! Currently we are using these settings:
[settings.pki.<company>-root-ca]
data="""<company-root-CA>"""
trusted=true
[settings.dns]
name-servers = ["10.xx.xx.2"]
search-list = ["<region>.compute.internal" , "<company>.local"]
At first this seems to work however when the node is pulling images we see it gives an error: dial tcp: lookup <repository>.<company>.local: Temporary failure in name resolution
. We see this on new and on older nodes, occasionally it seems to be able to resolve the name and actually pull the container. (This usually takes ~15 minutes per container)
We seem to have figured out the solution to our problem. Sharing here as it might be useful for other users. The <repository>.<company>.local
name is a cname to another domain <machine>.aws.local
. ResolveD couldn't figure out this second step as it didn't know what DNS to query it at.
The fix for us was to update the search-list to:
search-list = ["<region>.compute.internal" , "local"]
Thanks for following up @rmdvb!
Image I'm using: Currently using BottleRocket v1.21.1
What I expected to happen: To be able to resolve
<company>.local
domains, these are different domains than the AWS default:<region>.compute.internal
settings:
Before using WickeD we were able to resolve
<company>.local
domains.What actually happened: I'm not able to resolve
<company>.local
domains, I can however resolve other domains, including the local AWS domain<region>.compute.internal
. This gives issues pulling images from our private image repository.How to reproduce the problem: This happened when upgrading kubernetes to v1.28. As I understand this might be due to the change from WickeD to ResolveD.