canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.51k stars 772 forks source link

INTERNAL_ERROR when running kubectl commands against cluster deployed with 1.25/candidate snap channel and stable charm. #3401

Closed asbalderson closed 2 years ago

asbalderson commented 2 years ago

Summary

While testing microk8s 1.25/candidate (v1.25.0-rc.1) using using the stable charm for a 3 unit cluster. All 3 units came up active/idle, but i was unable to run juju add-k8s against the kube.conf from the cluster. After some inspection i was unable to run any commands against the cluster, kubectl --kubeconfig=kube.conf get po for example, returned a Unable to connect to the server: stream error: stream ID 1; INTERNAL_ERROR; received from peer

After inspecting the syslog on the units i saw lots of messages relating to context deadline exceeded for etcd.

Aug 22 18:40:50 microk8s6-1 microk8s.daemon-k8s-dqlite[10987]: time="2022-08-22T18:40:50Z" level=error msg="error while range on /registry/pods/kube-system/calico-kube-controllers-7bf8546cfb-j2rtr : query (try: 0): context deadline exceeded"
Aug 22 18:40:50 microk8s6-1 microk8s.daemon-kubelite[9647]: {"level":"warn","ts":"2022-08-22T18:40:50.923Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc001d84000//var/snap/microk8s/3686/var/kubernetes/backend/kine.sock:12379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Aug 22 18:40:50 microk8s6-1 microk8s.daemon-kubelite[9647]: E0822 18:40:50.924876    9647 status.go:71] apiserver received an error that is not an metav1.Status: context.deadlineExceededError{}: context deadline exceeded

What Should Happen Instead?

I would expect that when deploying 1.25 with the stable charm, i would get a working cluster which i could use juju add-k8s or query regular information against the cluster. kubectl get po for example would return the running pods.

Reproduction Steps

  1. deploy a bundle for microk8s, mine is below. I want to note that 10.246.164.2 is our MaaS server which is handling DNS.
    applications:
    microk8s:
    bindings:
      ? ''
      : oam-space
      cluster: internal-space
    charm: microk8s
    expose: true
    num_units: 3
    options:
      addons: dns ingress storage
      channel: 1.25/candidate
      containerd_env: "\n                HTTPS_PROXY=http://squid.internal:3128\n\
        \                NO_PROXY=10.1.0.0/16,10.152.183.0/24\n                ulimit\
        \ -n 65536 || true\n                ulimit -l 16384 || true\n            \
        \    "
      coredns_config: "\n                .:53 {\n                    errors\n    \
        \                health {\n                        lameduck 5s\n         \
        \           }\n                    ready\n                    log . {\n  \
        \                      class error\n                    }\n              \
        \      kubernetes cluster.local in-addr.arpa ip6.arpa {\n                \
        \        pods insecure\n                        fallthrough in-addr.arpa ip6.arpa\n\
        \                    }\n                    prometheus :9153\n           \
        \         forward . 10.246.164.2\n                    cache 30\n         \
        \           loop\n                    reload\n                    loadbalance\n\
        \                }\n                "
    to:
    - '0'
    - '1'
    - '2'
    machines:
    '0':
    constraints: tags=microk8s,silo3 zones=zone1
    '1':
    constraints: tags=microk8s,silo3 zones=zone2
    '2':
    constraints: tags=microk8s,silo3 zones=zone3
    relations: []
    series: focal
  2. grab the kube.conf from the leader unit juju exec microk8s/leader microk8s config and save it to a file (kube.conf)
  3. Run kubectl --kubeconfig=kube.conf get po

Introspection Report

inspection-report-20220822_191338.tar.gz

asbalderson commented 2 years ago

I should also note, when trying to add-k8s with juju i get the following output

$ KUBECONFIG=/home/ubuntu/project/generated/microk8s/kube.conf juju add-k8s microk8s_cloud --controller foundations-maas
ERROR making juju admin credentials in cluster: ensuring cluster role "juju-credential-bf5f2498" in namespace "kube-system": the server was unable to return a response in the time allotted, but may still be processing the request (get clusterroles.rbac.authorization.k8s.io juju-credential-bf5f2498)
asbalderson commented 2 years ago

attaching logs from other 2 units (0 and 2) inspection-report-20220822_193734.tar.gz inspection-report-20220822_193451.tar.gz

neoaggelos commented 2 years ago

Apologies for missing to reply on this issue.

I was unable to reproduce this issue on any of our development environments. Looking at the error messages, along with loglines filled with timeouts and slow disk ops, I think it might just be transient networking issues instead or resource limits (e.g. open files) instead.

Closing this issue for now, please reopen if this occurs again.