Closed MerzMax closed 2 weeks ago
@MerzMax Thanks for the issue. How did you come to the conclusion that the DNS resolution but no the actual connection to the 1.1.1.1 did fail? Did you check the Hubble flow logs? We have observed that sometimes connections to the 1.1.1.1 fail, see for more details here - https://cilium.slack.com/archives/C7PE7V806/p1668619257856639. I think this issue is yet another instance of the same connectivity failure.
@brb The issue linked in the Slack message describes a timeout when curl is executed. In my case the hostname one.one.one.one
can't get resolved.
It has to be a DNS issue since I am able to connect to 1.1.1.1
but not to one.one.one.one
. What is possible is to connect to one.one.one.one.
, what shows the issue described by the Stack Overflow entry linked above.
Here is the output I get when connecting in the client pod and executing curl
:
$ kubectl exec -it client2-6f8b754559-k58xx sh -n cilium-test
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # curl 1.1.1.1
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>cloudflare</center>
</body>
</html>
/ # curl one.one.one.one
curl: (6) Could not resolve host: one.one.one.one
/ # curl one.one.one.one.
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>cloudflare</center>
</body>
</html>
/ #
Just encountered the same issue. In my case, it was solved by removing the search domain from the host.
kudos @sqlstatement, lost 2 days of work rewriting my cluster/cilium conf, until reach your answer :)
@sqlstatement Could you elaborate? I'm new to Kubernetes. I've already spent hours trying to debug this issue. @bzero Or do you have any suggestions?
@erikschul Your search domain is probably handled by either:
Once you remove the search domain, the connectivity test should run as expected. Hope this helps :)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
This issue has not seen any activity since it was marked stale. Closing.
Bug report
General Information
cilium version
)kubectl version
, ...)How to reproduce the issue
cilium connectivity test
As you can see from the output
curl
can't reesolve the hostone.one.one.one
. That's why 5/31 tests fail.After some research we now have an idea of what's going on. For the tests the base image is an Alpine image (see here). For some reason Alpine has problems with DNS resolution in kubernetes clusters in it's
musl
library. Here you can find a very good explaination of what is happening: