Open DileepAP opened 9 months ago
baa@blr-brd-ha-02:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
blr-brd-ha-04 Ready
Node "blr-brd-ha-04" was shutdown, but node "blr-brd-ha-03" also went to "NotReady" status. At times, it use to take down even 4 other nodes also. The node was made down by around 15:27
###############################################################################################
baa@blr-brd-ha-02:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
blr-brd-ha-01 Ready
baa@blr-brd-ha-02:~$ date
Mon 5 Feb 15:45:33 UTC 2024
baa@blr-brd-ha-02:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
blr-brd-ha-04 NotReady
baa@blr-brd-ha-02:~$ date
Mon 5 Feb 15:48:16 UTC 2024
baa@blr-brd-ha-02:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
blr-brd-ha-04 NotReady
baa@blr-brd-ha-03:~$ kubectl describe node blr-brd-ha-03
Lease:
HolderIdentity: blr-brd-ha-03
AcquireTime:
NetworkUnavailable False Mon, 05 Feb 2024 14:37:50 +0000 Mon, 05 Feb 2024 14:37:50 +0000 CalicoIsUp Calico is running on this node MemoryPressure Unknown Mon, 05 Feb 2024 15:39:24 +0000 Mon, 05 Feb 2024 15:29:53 +0000 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Mon, 05 Feb 2024 15:39:24 +0000 Mon, 05 Feb 2024 15:29:53 +0000 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown Mon, 05 Feb 2024 15:39:24 +0000 Mon, 05 Feb 2024 15:29:53 +0000 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Mon, 05 Feb 2024 15:39:24 +0000 Mon, 05 Feb 2024 15:29:53 +0000 NodeStatusUnknown Kubelet stopped posting node status. Addresses: InternalIP: 10.40.101.185 Hostname: blr-brd-ha-03
Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits
cpu 1150m (14%) 1500m (18%)
memory 290Mi (0%) 1114Mi (3%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ Nodes are back after 20 plus mints
Mon 5 Feb 15:54:06 UTC 2024
baa@blr-brd-ha-02:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
blr-brd-ha-04 NotReady
inspection-report-20240205_153156.tar.gz Inspection Report from node 3
Actually am facing the same issue, haveing 3 node master and 3 worker node cluster, worker nodes remains in ready state but master went into Notready state. IP have static ip to the nodes so networking is not an issue. I am planning to switch my cluster now on k3s. Microk8s is destructive in case of abrupt power failure.
I am experiencing the same issue. I have a 6-node cluster, with 3 nodes as the master and 3 as workers. I am using Ubuntu 22.04 and microk8s 1.29.4. When I bring down the master node that is the leader for the dqlite cluster, I notice that some of my other nodes show as not ready in the cluster status. This status persists for about 16 to 17 minutes, after which the cluster reports only one node as offline.
Kubernetes versions i used is 1.28.3 and 1.29.0 - In Both versions, i faced the same issue
I have a six node HA cluster. microk8s status microk8s is running high-availability: yes datastore master nodes: 10.40.101.83:19001 10.40.101.185:19001 10.40.101.186:19001 datastore standby nodes: 10.40.101.85:19001 10.40.101.128:19001 10.40.101.129:19001
When any of the datastore nodes are shutdown, other nodes moves to NotReady status. This happens occasionally. The shutdown is "hard shutdown" from the hypervisor console
It takes around 20 mints to recover the nodes (expect the one which was powered off) automatically, and the applications are not accessible during this window.
I expect all the nodes to be in "Ready" state, other than the one which is powered off
Any fix for this issue..?
xaa@ha-02:~$ date Fri 2 Feb 12:28:23 UTC 2024 xaa@ha-02:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ha-05 Ready 66m v1.29.0 10.40.101.128 Red Hat Enterprise Linux 8.9 (Ootpa) 4.18.0-513.11.1.el8_9.x86_64 containerd://1.6.15
ha-01 Ready 83m v1.29.0 10.40.101.83 Red Hat Enterprise Linux 8.9 (Ootpa) 4.18.0-513.11.1.el8_9.x86_64 containerd://1.6.15
ha-06 Ready 61m v1.29.0 10.40.101.129 Red Hat Enterprise Linux 8.9 (Ootpa) 4.18.0-513.11.1.el8_9.x86_64 containerd://1.6.15
ha-04 Ready 71m v1.29.0 10.40.101.186 Red Hat Enterprise Linux 8.9 (Ootpa) 4.18.0-513.11.1.el8_9.x86_64 containerd://1.6.15
ha-03 Ready 75m v1.29.0 10.40.101.185 Red Hat Enterprise Linux 8.9 (Ootpa) 4.18.0-513.11.1.el8_9.x86_64 containerd://1.6.15
ha-02 Ready 78m v1.29.0 10.40.101.85 Red Hat Enterprise Linux 8.9 (Ootpa) 4.18.0-513.11.1.el8_9.x86_64 containerd://1.6.15
xaa@ha-02:~$
Every 2.0s: kubectl get nodes ha-03: Fri Feb 2 12:30:04 2024
NAME STATUS ROLES AGE VERSION ha-03 Ready 76m v1.29.0
ha-02 Ready 79m v1.29.0
ha-06 NotReady 62m v1.29.0
ha-01 NotReady 85m v1.29.0
ha-04 NotReady 73m v1.29.0
ha-05 NotReady 68m v1.29.0