canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.37k stars 765 forks source link

i/o timeout in coredns pod #1427

Closed knkski closed 1 day ago

knkski commented 4 years ago

Copying from https://github.com/ubuntu/microk8s/issues/958#issuecomment-661986123, as this looks to be the crux of the issue. For some reason, some queries to 8.8.8.8 and 8.8.4.4 for api.jujucharms.com are failing:

[ERROR] plugin/errors: 2 2577415620770853004.8780143945028532492. HINFO: read udp 10.1.21.36:44631->8.8.4.4:53: i/o timeout
[INFO] 10.1.21.38:33385 - 46047 "AAAA IN api.jujucharms.com.localdomain. udp 48 false 512" NOERROR - 0 2.000755655s
[ERROR] plugin/errors: 2 api.jujucharms.com.localdomain. AAAA: read udp 10.1.21.36:57766->8.8.8.8:53: i/o timeout
[INFO] 10.1.21.38:36178 - 22767 "A IN api.jujucharms.com.localdomain. udp 48 false 512" NOERROR - 0 2.000677982s
[ERROR] plugin/errors: 2 api.jujucharms.com.localdomain. A: read udp 10.1.21.36:44211->8.8.8.8:53: i/o timeout
[INFO] 10.1.21.38:60455 - 56232 "AAAA IN api.jujucharms.com.localdomain. udp 48 false 512" NOERROR - 0 2.000352953s
[ERROR] plugin/errors: 2 api.jujucharms.com.localdomain. AAAA: read udp 10.1.21.36:45297->8.8.8.8:53: i/o timeout
[INFO] 10.1.21.38:50299 - 52151 "A IN api.jujucharms.com.localdomain. udp 48 false 512" NOERROR - 0 2.000257257s
[ERROR] plugin/errors: 2 api.jujucharms.com.localdomain. A: read udp 10.1.21.36:40710->8.8.4.4:53: i/o timeout
[INFO] 10.1.21.38:39253 - 3642 "AAAA IN api.jujucharms.com. udp 36 false 512" NOERROR - 0 2.000834815s
[ERROR] plugin/errors: 2 api.jujucharms.com. AAAA: read udp 10.1.21.36:38538->8.8.8.8:53: i/o timeout
[INFO] 10.1.21.38:47372 - 10457 "A IN api.jujucharms.com. udp 36 false 512" NOERROR - 0 2.000768237s
[ERROR] plugin/errors: 2 api.jujucharms.com. A: read udp 10.1.21.36:33745->8.8.4.4:53: i/o timeout
[INFO] 10.1.21.38:47852 - 7227 "AAAA IN api.jujucharms.com. udp 36 false 512" NOERROR - 0 2.000442768s
[ERROR] plugin/errors: 2 api.jujucharms.com. AAAA: read udp 10.1.21.36:42672->8.8.4.4:53: i/o timeout
[INFO] 10.1.21.38:60290 - 23521 "A IN api.jujucharms.com. udp 36 false 512" NOERROR - 0 2.000334072s
[ERROR] plugin/errors: 2 api.jujucharms.com. A: read udp 10.1.21.36:52312->8.8.8.8:53: i/o timeout
ktsakalozos commented 4 years ago

@davigar15 has reported the same issue.

This is sporadic, right? Some things we could try is to use the host to resolver. forward . /etc/resolv.conf or maybe increase the cache to something longer than 30 secs [1].

Could anyone try any of the above and report back any results?

[1] https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/

bipinm commented 4 years ago

Do not think this is a sporadic/transient issue. I have tried for 3 days continuously (microk8s.enable kubeflow), sometimes at different times of day and got the same error each time. Tried to debug a little bit after N failures and then found the errors in coredns pod. Also went into one another pod and tried ping 8.8.8.8, which worked fine but pings to other public IPs/api.jujucharms.com failed.

As mentioned in my comment here, I did not face this issue on AWS EC2 instance with Ubuntu 2004 Server image. Until yesterday i was trying on Ubuntu Desktop 2004 (running in VMware Player) and today i have tested with Ubuntu 2004 Server locally (running in VMware Player) and not very surprising, Kubeflow was deployed successfully. Similar behavior is mentioned here

atamahjoubfar commented 4 years ago

I just tried microk8s enable kubeflow on another machine with Ubuntu 18.04. Same error:

ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/ambassador-89": cannot retrieve charm "cs:~kubeflow-charmers/ambassador-89": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-89/archive?channel=stable: dial tcp: lookup api.jujucharms.com on 10.152.183.10:53: server misbehaving
Command '('microk8s-juju.wrapper', 'deploy', 'cs:kubeflow-195', '--channel', 'stable', '--overlay', '/tmp/tmpfdpsap90')' returned non-zero exit status 1
Failed to enable kubeflow
knkski commented 4 years ago

@bipinm: can you expand a little more on this?

Also went into one another pod and tried ping 8.8.8.8, which worked fine but pings to other public IPs/api.jujucharms.com failed.

Which pod(s) did you go into, and which other public IPs did you try?

knkski commented 4 years ago

After debugging with @davigar15, I think this issue is not actually Kubeflow-specific, and is a general networking issue that starts happening after a computer with microk8s is rebooted. @bipinm, @atamahjoubfar, can you verify that this is related to rebooting the host machine for microk8s? @davigar15 says that when he runs into this issue, reinstalling the microk8s snap fixes things for him.

atamahjoubfar commented 4 years ago

I reinstalled microk8s, and enabled kubeflow without reboot. I still get the same error message.

bipinm commented 4 years ago

Without a restart after sudo snap install microk8s --classic , Kubeflow was deployed successfully. So far I was always restarting after seeing this message

bipinm@ubuntu:~$ microk8s.status --wait-ready
Insufficient permissions to access MicroK8s.
You can either try again with sudo or add the user bipinm to the 'microk8s' group:

    sudo usermod -a -G microk8s bipinm
    sudo chown -f -R bipinm ~/.kube

The new group will be available on the user's next login.

This time instead of a re-start, ran sudo usermod ... & gnome-session-quit commands followed by rest

microk8s.status --wait-ready
microk8s.enable dns dashboard storage
microk8s.enable kubeflow

In my previous tests on Ubuntu 2004 server i followed similar restart step, but did not encounter this problem. Only seem to occur on 2004 Desktop version.

@knkski: The pod i used for ping test was nginx-ingress-microk8s-controller

Now Without restart, i go into the pod and can ping public domain names (google.com, jujucharms.com etc) After restart, cannot ping any public domain names but can ping their IP addresses

Test after install of microk8s + enable kubelow but without restart (from within nginx-ingress-microk8s-controller)

bash-5.0$ nslookup google.com
Server:         10.152.183.10
Address:        10.152.183.10:53

Non-authoritative answer:
Name:   google.com
Address: 142.250.67.78

Non-authoritative answer:
Name:   google.com
Address: 2404:6800:4007:807::200e

ash-5.0$ nslookup jujucharms.com
Server:         10.152.183.10
Address:        10.152.183.10:53

Non-authoritative answer:
Name:   jujucharms.com
Address: 91.189.88.181
Name:   jujucharms.com
Address: 91.189.91.45
Name:   jujucharms.com
Address: 91.189.91.44
Name:   jujucharms.com
Address: 91.189.88.180

Non-authoritative answer:
Name:   jujucharms.com
Address: 2001:67c:1562::20
Name:   jujucharms.com
Address: 2001:67c:1360:8001::2c
Name:   jujucharms.com
Address: 2001:67c:1562::1f
Name:   jujucharms.com
Address: 2001:67c:1360:8001::2b

Test after install of microk8s restart (+ failed enable kubelow)

bash-5.0$ nslookup google.com
Server:         10.152.183.10
Address:        10.152.183.10:53

;; connection timed out; no servers could be reached

bash-5.0$ nslookup api.jujucharms.com
Server:         10.152.183.10
Address:        10.152.183.10:53

;; connection timed out; no servers could be reached
ktsakalozos commented 4 years ago

@bipinm immediately after a reboot the k8s networking is not correctly setup. During that period I was getting:

bash-5.0$ nslookup jujucharms.com
nslookup: write to '10.152.183.10': Operation not permitted
;; connection timed out; no servers could be reached

Within 2 to 3 minutes the pods were reporting state Unknown and the control plane was rescheduling them getting them into Ready state again. After that point name resolution is working again.

bipinm commented 4 years ago

@ktsakalozos, i was running nslookup > 15+ minutes post reboot. Sequence of steps which always failed for me

  1. sudo snap install microk8s --classic
  2. sudo usermod -a -G microk8s user
  3. Reboot
  4. After system is up, check status with microk8s.status --wait-ready
  5. Also confirm all pods are running
  6. microk8s.enable dns dashboard storage
  7. Open dashboard UI and confirm everything is fine
  8. microk8s.enable kubeflow
  9. ping test and nslookup from pod nginx-ingress-microk8s-controller-xxxx (this is probably 15+ minutes after reboot)

Will try this once again to confirm, i am not absolutely sure if i was rebooting after step 6

LinoBert commented 4 years ago

@ktsakalozos , same here. I'm running microk8s on my Ubuntu 18.04 dev machine. After booting the machine none of the Pods are starting because they are downloading some files but can't resolve the domain names so can't resolve the imports(Deno).

After manually stopping and starting the cluster via microk8s stop / microk8s start DNS resolution works perfectly fine, so something with the automatic startup process seems not to work as expected.

atamahjoubfar commented 4 years ago

@ktsakalozos not rebooting the machine or microk8s stop/start did not resolve the issue for me. I have confirmed that juju can deploy apps on the host machine, so it should not be a networking issue of the host:

+ microk8s-juju.wrapper --debug add-model kubeflow microk8s

02:01:45 INFO  juju.cmd supercommand.go:83 running juju [2.7.6 4da406fb326d7a1255f97a7391056641ee86715b gc go1.12.17]
02:01:45 DEBUG juju.cmd supercommand.go:84   args: []string{"/snap/microk8s/1551/bin/juju", "--debug", "add-model", "kubeflow", "microk8s"}
02:01:45 INFO  juju.juju api.go:67 connecting to API addresses: [10.152.183.24:17070]
02:01:45 DEBUG juju.api apiclient.go:1092 successfully dialed "wss://10.152.183.24:17070/api"
02:01:45 INFO  juju.api apiclient.go:624 connection established to "wss://10.152.183.24:17070/api"
02:01:45 INFO  cmd authkeys.go:114 Adding contents of "/var/snap/microk8s/1551/juju/share/juju/ssh/juju_id_rsa.pub" to authorized-keys
02:01:45 INFO  cmd addmodel.go:301 Added 'kubeflow' model on microk8s/localhost with credential 'microk8s' for user 'admin'
02:01:45 DEBUG juju.api monitor.go:35 RPC connection died
02:01:45 INFO  cmd supercommand.go:525 command finished

+ microk8s-juju.wrapper --debug deploy cs:kubeflow-195 --channel stable --overlay /tmp/tmpt7h9ykaa
Kubeflow could not be enabled:
02:01:45 INFO  juju.cmd supercommand.go:83 running juju [2.7.6 4da406fb326d7a1255f97a7391056641ee86715b gc go1.12.17]
02:01:45 DEBUG juju.cmd supercommand.go:84   args: []string{"/snap/microk8s/1551/bin/juju", "--debug", "deploy", "cs:kubeflow-195", "--channel", "stable", "--overlay", "/tmp/tmpt7h9ykaa"}
02:01:45 INFO  juju.juju api.go:67 connecting to API addresses: [10.152.183.24:17070]
02:01:45 DEBUG juju.api apiclient.go:1092 successfully dialed "wss://10.152.183.24:17070/model/644c781a-2e54-4ea7-8f5a-13448c037141/api"
02:01:45 INFO  juju.api apiclient.go:624 connection established to "wss://10.152.183.24:17070/model/644c781a-2e54-4ea7-8f5a-13448c037141/api"
02:01:46 INFO  juju.juju api.go:67 connecting to API addresses: [10.152.183.24:17070]
02:01:46 DEBUG juju.api apiclient.go:1092 successfully dialed "wss://10.152.183.24:17070/api"
02:01:46 INFO  juju.api apiclient.go:624 connection established to "wss://10.152.183.24:17070/api"
02:01:46 DEBUG juju.cmd.juju.application deploy.go:1442 cannot interpret as local charm: file does not exist
02:01:46 DEBUG juju.cmd.juju.application deploy.go:1294 cannot interpret as a redeployment of a local charm from the controller
02:01:46 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/kubeflow-195/meta/any?channel=stable&include=id&include=supported-series&include=published {
02:01:47 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:47 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/bundle/kubeflow-195/archive {
02:01:47 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:47 INFO  cmd deploy.go:1546 Located bundle "cs:bundle/kubeflow-195"
02:01:47 DEBUG juju.cmd.juju.application bundle.go:312 model: &bundlechanges.Model{
    Applications: {
    },
    Machines: {
    },
    Relations:        nil,
    ConstraintsEqual: func(string, string) bool {...},
    Sequence:         {},
    sequence:         {},
    MachineMap:       {},
    logger:           nil,
}
02:01:47 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/ambassador-89
02:01:47 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-89/meta/any?include=id&include=supported-series&include=published {
02:01:47 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:47 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/argo-controller-173
02:01:47 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/argo-controller-173/meta/any?include=id&include=supported-series&include=published {
02:01:47 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:47 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/argo-ui-89
02:01:47 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/argo-ui-89/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/dex-auth-32
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/dex-auth-32/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/jupyter-controller-187
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/jupyter-controller-187/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/jupyter-web-93
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/jupyter-web-93/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/katib-controller-87
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/katib-controller-87/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~charmed-osm/mariadb-k8s
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~charmed-osm/mariadb-k8s/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/katib-manager-86
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/katib-manager-86/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/katib-ui-82
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/katib-ui-82/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/kubeflow-dashboard-47
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/kubeflow-dashboard-47/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/kubeflow-profiles-53
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/kubeflow-profiles-53/meta/any?include=id&include=supported-series&include=published {
02:01:48 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:48 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/metacontroller-79
02:01:48 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/metacontroller-79/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/metadata-api-42
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/metadata-api-42/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~charmed-osm/mariadb-k8s
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~charmed-osm/mariadb-k8s/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/metadata-envoy-25
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/metadata-envoy-25/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/metadata-grpc-25
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/metadata-grpc-25/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/metadata-ui-45
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/metadata-ui-45/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/minio-89
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/minio-89/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/modeldb-backend-86
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/modeldb-backend-86/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~charmed-osm/mariadb-k8s
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~charmed-osm/mariadb-k8s/meta/any?include=id&include=supported-series&include=published {
02:01:49 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:49 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/modeldb-store-80
02:01:49 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/modeldb-store-80/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/modeldb-ui-80
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/modeldb-ui-80/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/oidc-gatekeeper-30
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/oidc-gatekeeper-30/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pipelines-api-93
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pipelines-api-93/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~charmed-osm/mariadb-k8s
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~charmed-osm/mariadb-k8s/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pipelines-persistence-178
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pipelines-persistence-178/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pipelines-scheduledworkflow-174
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pipelines-scheduledworkflow-174/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pipelines-ui-89
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pipelines-ui-89/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pipelines-viewer-114
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pipelines-viewer-114/meta/any?include=id&include=supported-series&include=published {
02:01:50 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:50 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pipelines-visualization-24
02:01:50 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pipelines-visualization-24/meta/any?include=id&include=supported-series&include=published {
02:01:51 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:51 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/pytorch-operator-174
02:01:51 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/pytorch-operator-174/meta/any?include=id&include=supported-series&include=published {
02:01:51 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:51 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/seldon-core-27
02:01:51 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/seldon-core-27/meta/any?include=id&include=supported-series&include=published {
02:01:51 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:51 INFO  cmd bundle.go:370 Resolving charm: cs:~kubeflow-charmers/tf-job-operator-170
02:01:51 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/tf-job-operator-170/meta/any?include=id&include=supported-series&include=published {
02:01:51 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:51 DEBUG httpbakery client.go:243 client do GET https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-89/meta/any?include=id&include=supported-series&include=published {
02:01:51 DEBUG httpbakery client.go:245 } -> error <nil>
02:01:59 DEBUG juju.api monitor.go:35 RPC connection died
ERROR cannot deploy bundle: cannot add charm "cs:~kubeflow-charmers/ambassador-89": cannot retrieve charm "cs:~kubeflow-charmers/ambassador-89": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-89/archive?channel=stable: dial tcp: lookup api.jujucharms.com on 10.152.183.10:53: server misbehaving
02:01:59 DEBUG cmd supercommand.go:519 error stack: 
cannot retrieve charm "cs:~kubeflow-charmers/ambassador-89": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/~kubeflow-charmers/ambassador-89/archive?channel=stable: dial tcp: lookup api.jujucharms.com on 10.152.183.10:53: server misbehaving
/workspace/_build/src/github.com/juju/juju/rpc/client.go:178: 
/workspace/_build/src/github.com/juju/juju/api/apiclient.go:1187: 
/workspace/_build/src/github.com/juju/juju/api/client.go:459: 
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/store.go:68: 
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/bundle.go:549: cannot add charm "cs:~kubeflow-charmers/ambassador-89"
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/bundle.go:481: 
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/bundle.go:165: 
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/deploy.go:960: cannot deploy bundle
/workspace/_build/src/github.com/juju/juju/cmd/juju/application/deploy.go:1548: 

Command '('microk8s-juju.wrapper', '--debug', 'deploy', 'cs:kubeflow-195', '--channel', 'stable', '--overlay', '/tmp/tmpt7h9ykaa')' returned non-zero exit status 1
Failed to enable kubeflow
dkolbly commented 3 years ago

I am having this issue as well; I noticed it after a reboot. Restarting (disable/enable) dns did not fix for me either. I also tried switching off of 8.8.x.x but that did not help either.

Stopping all of microk8s (microk8s stop / microk8s start) did get things back to a working state :+1:

root@jupiter:~# microk8s.kubectl logs coredns-86f78bb79c-9k4f4 -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = be0f52d3c13480652e0c73672f2fa263
CoreDNS-1.6.6
linux/amd64, go1.13.5, 6a7a75e
[INFO] 127.0.0.1:46538 - 47659 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 6.002244698s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:35934->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:42170 - 16039 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 6.001102944s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:44386->8.8.4.4:53: i/o timeout
[INFO] 127.0.0.1:33814 - 33621 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000673367s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:52975->8.8.4.4:53: i/o timeout
[INFO] 127.0.0.1:58732 - 22568 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 6.001175924s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:47671->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:40808 - 32260 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000327329s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:44845->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:60227 - 27681 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000475429s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:38552->8.8.4.4:53: i/o timeout
[INFO] 127.0.0.1:53987 - 2865 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000480646s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:54745->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:53931 - 13702 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000535349s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:33555->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:32825 - 40340 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000450614s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:39063->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:37796 - 25979 "HINFO IN 8681453584971852102.1290459051933204281. udp 57 false 512" NOERROR - 0 2.000400431s
[ERROR] plugin/errors: 2 8681453584971852102.1290459051933204281. HINFO: read udp 10.1.71.251:59722->8.8.4.4:53: i/o timeout
[INFO] 10.1.71.250:53768 - 48206 "AAAA IN www.googleapis.com. udp 36 false 512" NOERROR - 0 2.000422322s
[ERROR] plugin/errors: 2 www.googleapis.com. AAAA: read udp 10.1.71.251:40001->8.8.8.8:53: i/o timeout
[INFO] 10.1.71.250:38399 - 7349 "A IN www.googleapis.com. udp 36 false 512" NOERROR - 0 2.000599888s
[ERROR] plugin/errors: 2 www.googleapis.com. A: read udp 10.1.71.251:36263->8.8.8.8:53: i/o timeout
[INFO] 10.1.71.250:43097 - 40033 "AAAA IN www.googleapis.com. udp 36 false 512" NOERROR - 0 2.000564206s
[ERROR] plugin/errors: 2 www.googleapis.com. AAAA: read udp 10.1.71.251:55784->8.8.8.8:53: i/o timeout
[INFO] 10.1.71.250:60260 - 65419 "A IN www.googleapis.com. udp 36 false 512" NOERROR - 0 2.000480316s
[ERROR] plugin/errors: 2 www.googleapis.com. A: read udp 10.1.71.251:57656->8.8.4.4:53: i/o timeout

The symptom for me is that other services running the in cluster get errors like dial tcp: lookup www.googleapis.com on 10.152.183.10:53: server misbehaving

ktsakalozos commented 3 years ago

Hi @dkolbly could you attach the tarball produced by microk8s.inspect?

There is also this page that may help in debugging DNS resolution issues: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

dkolbly commented 3 years ago

Thanks @ktsakalozos I was not aware of that debugging page, and I didn't think to grab an inspect while it was broken, but here is the current state of the system in case it helps. inspection-report-20200918_151701.tar.gz

FWIW, I'm going to need to power cycle the system this weekend to put it on a UPS so I'll keep an eye for a recurrence of the problem.

exi commented 3 years ago

Any updates on this? I have run into this issue multiple times as well. Last time, i clean uninstall/reinstall of microk8s fixed it. This time i reverted from a ha cluster to a non-ha cluster and I have the same issue.

In my case coredns cannot even talk to the master on the same machine: E1023 22:52:10.292560 1 reflector.go:125] pkg/mod/k8s.io/client-go@v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Namespace: Get https://10.152.183.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.152.183.1:443: i/o timeout

knkski commented 3 years ago

I believe this issue is fixed in #1635, which introduces handling around the calico networking. If anybody wants to try it out, it'll be available via latest/edge as soon as CD is done pushing that out, otherwise that fix will come in 1.20.

kingman commented 3 years ago

I'm currently getting in coredns pod with the latest/edge, could my issue be related to this one?


  Warning  FailedCreatePodSandBox  2m20s  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "7292c565f6c5da128168e118eac02eb869b47a9191ec320c923153e2dcd41ef6": error getting ClusterInformation: Get https://[10.152.183.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.152.183.1:443: i/o timeout
  Warning  FailedCreatePodSandBox  95s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e7a30d0d8b3d09b4a25957ef267ea2dd548b70552f42c043aa63d3f7cf9172ea": error getting ClusterInformation: Get https://[10.152.183.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.152.183.1:443: i/o timeout
  Warning  FailedCreatePodSandBox  54s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "dd29dc68affcdcb012860dd9fd7a620c238e2d850c18dbe1834a6a0970c1e5e6": error getting ClusterInformation: Get https://[10.152.183.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.152.183.1:443: i/o timeout
  Warning  FailedCreatePodSandBox  12s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9deb3e6c4c955202232dbecf389987c479a0e8605db47cdb04b5053b9a9b5a75": error getting ClusterInformation: Get https://[10.152.183.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.152.183.1:443: i/o timeout
brendanmckenzie commented 3 years ago

I'm also experiencing this issue on Ubuntu 20.10 (GNU/Linux 5.8.0-26-generic x86_64)

I run:

sudo snap install microk8s --classic
microk8s enable dns

Wait for it to do its thing, then:

$ mkctl --namespace kube-system logs coredns-7f9c69c78c-pkmjt

And it's full of:

[ERROR] plugin/errors: 2 5968981741059125262.7191380029860206540. HINFO: read udp 10.1.89.66:60257->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:45683 - 51409 "HINFO IN 5968981741059125262.7191380029860206540. udp 57 false 512" NOERROR - 0 2.000492982s
[ERROR] plugin/errors: 2 5968981741059125262.7191380029860206540. HINFO: read udp 10.1.89.66:45638->8.8.8.8:53: i/o timeout
[INFO] 127.0.0.1:47762 - 48500 "HINFO IN 5968981741059125262.7191380029860206540. udp 57 false 512" NOERROR - 0 2.000423213s
[ERROR] plugin/errors: 2 5968981741059125262.7191380029860206540. HINFO: read udp 10.1.89.66:39596->8.8.4.4:53: i/o timeout
[INFO] 127.0.0.1:60306 - 65170 "HINFO IN 5968981741059125262.7191380029860206540. udp 57 false 512" NOERROR - 0 2.000368166s
[ERROR] plugin/errors: 2 5968981741059125262.7191380029860206540. HINFO: read udp 10.1.89.66:33063->8.8.8.8:53: i/o timeout

microk8s inspect doesn't show any issue, iptables is configured to allow forwarding.

I've tried both v1.21 and v1.22-alpha.1 and the issue is present in both.

ktsakalozos commented 3 years ago

@brendanmckenzie I feel this might be the dns pod not being able to reach 8.8.8.8. Did you try setting a different forward dns as describe in [1]?

[1] https://discuss.kubernetes.io/t/add-on-dns/11287

brendanmckenzie commented 3 years ago

The issue is present no matter what forwarding DNS server I use.

Additionally - other pods are able to ping 8.8.8.8 (so is the host machine).

$ mkctl run test --image=alpine -ti -- sh
If you don't see a command prompt, try pressing enter.
/ # ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=111 time=1.079 ms
64 bytes from 8.8.8.8: seq=1 ttl=111 time=1.216 ms
^C
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 1.079/1.147/1.216 ms
/ # nslookup dns.google
Server:     10.152.183.10
Address:    10.152.183.10:53

;; connection timed out; no servers could be reached

And the subsequent logs from coredns -

[ERROR] plugin/errors: 2 dns.google. AAAA: read udp 10.1.89.66:45472->8.8.4.4:53: i/o timeout
[ERROR] plugin/errors: 2 dns.google. A: read udp 10.1.89.66:57037->8.8.4.4:53: i/o timeout
brendanmckenzie commented 3 years ago

🤦‍♂️ for some reason, port 53 outbound requests from my server were being blocked. I switched to using my hosting provider's DNS and now coredns is working as expected.

mohammedi-haroune commented 2 years ago

I'm facing the exact same issue, anyone come up with a solution to this ?

sdarvell commented 2 years ago

I fixed this in my setup by changing the CIDR as detailed in the doco below. It appears there's network and DNS resolution issues when your hosts DNS / local network is within the default microk8s pod subnet of 10.1.0.0/16.

https://microk8s.io/docs/change-cidr

SidMorad commented 2 years ago

I did face this issue in following environment:

and upgrading to Ubuntu 22.04 solves the issue. I hope this be useful for fixing this issue.

amandahla commented 1 year ago

I did face this issue after rebooting my machine in following environment:

Fixed by what was already suggested here: microk8s stop/start

akzov commented 1 year ago

Didn't help

logici commented 12 months ago

I also face this issue in following environment:

Ubuntu 20.04 Microk8s channel 1.24/stable

stale[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.