canonical / microk8s

MicroK8s is a small, fast, single-package Kubernetes for datacenters and the edge.
https://microk8s.io
Apache License 2.0
8.48k stars 772 forks source link

Microk8s is not running, but some pods does #4594

Open ZhouJian26 opened 2 months ago

ZhouJian26 commented 2 months ago

Summary

I have a cluster in HA with MicroK8s version 1.29 stable. I found the cluster crashed for no apparent reason, and I can't figure out how to solve this problem. On each node, microk8s status says microk8s is not running. Use microk8s inspect for a deeper inspection. However, microk8s inspect indicates that everything is working fine.

I noticed that I'm unable to perform microk8s kubectl get nodes, which result with Error from server: rpc error: code = Unknown desc = query (try: 0): theid

I tried stopping and restarting MicroK8s on each node, and I also restarted all the servers multiple times.

I see that some services are running while others simply cannot start. The most common error I'm seeing is: error=rpc error: code = Unknown desc = query (try: 0): theid.

Commands like kubectl get namespace are working fine

Reproduction Steps

I'm unable to reproduce this cluster state with a brand new one.

Introspection Report

inspection-report-20240731_131206.tar.gz

Can you suggest a fix?

no, idea

Are you interested in contributing with a fix?

I do not have the skill to do it

crypto-titan commented 1 month ago

this is entirely valid I am getting the same thing:

ubuntu@stakepool-de-02:~$ kubectl get nodes -o wide Error from server: rpc error: code = Unknown desc = query (try: 0): theid

2024-09-09T17:06:46.160808+00:00 stakepool-de-13 microk8s.daemon-kubelite[2692964]: W0909 17:06:46.160123 2692964 logging.go:59] [core] [Channel #1 SubChannel #3] grpc: addrConn.createTransport failed to connect to {Addr: "unix:///var/snap/microk8s/7180/var/kubernetes/backend/kine.sock:12379", ServerName: "kine.sock:12379", }. Err: connection error: desc = "transport: Error while dialing: dial unix /var/snap/microk8s/7180/var/kubernetes/backend/kine.sock:12379: connect: connection refused" 2024-09-09T17:06:46.241642+00:00 stakepool-de-13 microk8s.daemon-kubelite[2692964]: W0909 17:06:46.241539 2692964 logging.go:59] [core] [Channel #2 SubChannel #4] grpc: addrConn.createTransport failed to connect to {Addr: "unix:///var/snap/microk8s/7180/var/kubernetes/backend/kine.sock:12379", ServerName: "kine.sock:12379", }. Err: connection error: desc = "transport: Error while dialing: dial unix /var/snap/microk8s/7180/var/kubernetes/backend/kine.sock:12379: connect: connection refused" 2024-09-09T17:06:46.511467+00:00 stakepool-de-13 microk8s.daemon-kubelite[2692964]: W0909 17:06:46.511330 2692964 logging.go:59] [core] [Channel #5 SubChannel #6] grpc: addrConn.createTransport failed to connect to {Addr: "unix:///var/snap/microk8s/7180/var/kubernetes/backend/kine.sock:12379", ServerName: "kine.sock:12379", }. Err: connection error: desc = "transport: Error while dialing: dial unix /var/snap/microk8s/7180/var/kubernetes/backend/kine.sock:12379: connect: connection refused" 2024-09-09T17:06:47.910484+00:00 stakepool-de-13 agent[32789]: 2024-09-09 17:06:47 UTC | CORE | INFO | (pkg/logs/launchers/file/launcher.go:337 in handleTailingModeChange) | Tailing mode changed for file:/var/log/auth.log. Was: end: Now: beginning

It's happening on multiple servers.

crypto-titan commented 1 month ago

Ubuntu 24.04 MicroK8s 1.30.1