Kuber fails to delete node, unable to ensure replication of pvc.
2024-11-11T10:09:40Z INF Deleting nodes - control nodes [0], compute nodes[1] cluster=e2e-684iw35 module=kuber
e2e-684iw35 node/azr-auto-cmpt-aln243g-02 already cordoned
e2e-684iw35 volume.longhorn.io/pvc-07b9c964-95c6-469e-9569-b97aefa6176f patched
2024-11-11T10:09:41Z INF Waiting 10 seconds for new replicas to be scheduled if possible for node azr-auto-cmpt-aln243g-02 of cluster cluster=e2e-684iw35 module=kuber
2024-11-11T10:14:51Z ERR Error while deleting nodes error="error while making sure storage is replicated before deletion on cluster e2e-684iw35 : error while checking if all longhorn replicas for volume pvc-07b9c964-95c6-469e-9569-b97aefa6176f are running : error while checking the status of volume pvc-07b9c964-95c6-469e-9569-b97aefa6176f replication : context deadline exceeded" cluster=e2e-684iw35 module=kuber
![2](https://github.com/user-attachments/assets/eb5f37f9-1e5c-49cc-871b-6361c42df4f3)
4. Further Decrease node by 1.
Now some of the volumes are completely detached and some are degraded due to unable to schedule replicas
![3](https://github.com/user-attachments/assets/291c7597-6f64-451f-a76a-3e6acc629210)
This steps simulate the scale down that has happened in the e2e cluster, if we were to repeat this many times, i.e. add nodes then delete nodes at some point we would hit the problem described above. I've managed to reproduce it only once.
The issue is that we deploy a storage class we numOfReplicas: 3
Kuber fails to delete node, unable to ensure replication of pvc.
Steps To Reproduce
kubernetes: clusters:
kubectl get pods -A --kubeconfig ./test NAMESPACE NAME READY STATUS RESTARTS AGE cert-manager cert-manager-5bd57786d4-hpnqj 1/1 Running 0 41m cert-manager cert-manager-cainjector-57657d5754-8q6kc 1/1 Running 0 41m cert-manager cert-manager-webhook-7d9f8748d4-nm89r 1/1 Running 1 (4m18s ago) 41m claudie ansibler-547d5d4477-dq7rt 1/1 Running 1 (3m37s ago) 7m48s claudie builder-74c6c4bc6d-4ps6r 1/1 Running 0 7m47s claudie claudie-operator-779c58f857-6wsz6 1/1 Running 0 17m claudie create-table-job-k5m9m 0/1 Completed 1 17m claudie dynamodb-d764d9d9d-zmfnt 1/1 Running 0 17m claudie kube-eleven-6c449847c-h694k 1/1 Running 2 (4m3s ago) 17m claudie kuber-757f496d76-2vggg 1/1 Running 1 (3m35s ago) 7m47s claudie make-bucket-job-vdmwj 0/1 Completed 0 17m claudie manager-7c86c5dff6-wrwsv 1/1 Running 2 (16m ago) 17m claudie minio-0 1/1 Running 0 7m43s claudie minio-1 1/1 Running 0 17m claudie minio-2 1/1 Running 0 17m claudie minio-3 1/1 Running 0 7m43s claudie mongodb-b79df96d5-d4wjd 1/1 Running 0 17m claudie terraformer-5767b65455-nhchr 1/1 Running 0 17m kube-system cilium-2b22k 1/1 Running 0 88s kube-system cilium-5zvjp 1/1 Running 0 109s kube-system cilium-ldbpq 1/1 Running 0 110s kube-system cilium-operator-555d4c4d76-thnpv 1/1 Running 0 78m kube-system coredns-778c49ccf5-62pfm 1/1 Running 0 2m28s kube-system coredns-778c49ccf5-8xkjn 1/1 Running 0 2m28s kube-system etcd-gcp-ctrl-nodes-m6kpu0e-01 1/1 Running 0 79m kube-system hubble-generate-certs-mqsmm 0/1 Completed 0 2m16s kube-system hubble-relay-d65ffb68f-5xwdr 1/1 Running 1 (4m12s ago) 7m47s kube-system hubble-ui-86f6cd444-t75hp 2/2 Running 0 7m47s kube-system kube-apiserver-gcp-ctrl-nodes-m6kpu0e-01 1/1 Running 0 79m kube-system kube-controller-manager-gcp-ctrl-nodes-m6kpu0e-01 1/1 Running 0 79m kube-system kube-proxy-5vqg9 1/1 Running 0 78m kube-system kube-proxy-cxkxj 1/1 Running 0 79m kube-system kube-proxy-zc2mf 1/1 Running 0 78m kube-system kube-scheduler-gcp-ctrl-nodes-m6kpu0e-01 1/1 Running 0 79m kube-system metrics-server-b65cdc569-dt5mr 1/1 Running 0 78m longhorn-system csi-attacher-6c4495498-svb2k 1/1 Running 1 (102s ago) 75m longhorn-system csi-attacher-6c4495498-tcg5t 1/1 Running 1 (7m11s ago) 7m47s longhorn-system csi-attacher-6c4495498-xpwr9 1/1 Running 0 75m longhorn-system csi-provisioner-7d8cf4f58f-4rjxk 1/1 Running 0 75m longhorn-system csi-provisioner-7d8cf4f58f-g7gk2 1/1 Running 0 7m47s longhorn-system csi-provisioner-7d8cf4f58f-wvv74 1/1 Running 0 75m longhorn-system csi-resizer-77b968dfcd-d75cd 1/1 Running 0 75m longhorn-system csi-resizer-77b968dfcd-wmp4q 1/1 Running 0 75m longhorn-system csi-resizer-77b968dfcd-z8pzf 1/1 Running 0 7m47s longhorn-system csi-snapshotter-77699d78fb-7bp22 1/1 Running 0 75m longhorn-system csi-snapshotter-77699d78fb-jzrmm 1/1 Running 0 75m longhorn-system csi-snapshotter-77699d78fb-z9z8f 1/1 Running 0 7m48s longhorn-system engine-image-ei-04c05bf8-4zvc2 1/1 Running 0 76m longhorn-system engine-image-ei-04c05bf8-qgq6x 1/1 Running 0 76m longhorn-system instance-manager-be12e3a749d94492623a35549fdfbf8b 1/1 Running 0 75m longhorn-system instance-manager-c602a5b2595317e1ceb34084760c4288 1/1 Running 0 75m longhorn-system longhorn-csi-plugin-57d6n 3/3 Running 0 75m longhorn-system longhorn-csi-plugin-kw5gj 3/3 Running 2 (3m49s ago) 75m longhorn-system longhorn-driver-deployer-55b7b5c7b4-wckhj 1/1 Running 2 (76m ago) 76m longhorn-system longhorn-manager-6dwjw 2/2 Running 0 76m longhorn-system longhorn-manager-jx4q7 2/2 Running 1 (76m ago) 76m longhorn-system longhorn-ui-786c6ff-tcql6 1/1 Running 0 76m longhorn-system longhorn-ui-786c6ff-xg4b5 1/1 Running 0 76m
Scheduling Failure Replica Scheduling Failure Error Message: precheck new replica failed
kubectl get pods -A --kubeconfig ./test NAMESPACE NAME READY STATUS RESTARTS AGE cert-manager cert-manager-5bd57786d4-hpnqj 1/1 Running 0 51m cert-manager cert-manager-cainjector-57657d5754-jqzxr 1/1 Running 0 5m29s cert-manager cert-manager-webhook-7d9f8748d4-x5xl2 1/1 Running 0 5m29s claudie ansibler-547d5d4477-t28xr 0/1 Pending 0 5m29s claudie builder-74c6c4bc6d-4rsjr 1/1 Running 0 5m29s claudie claudie-operator-779c58f857-9mmtz 0/1 Pending 0 5m29s claudie dynamodb-d764d9d9d-njgkv 0/1 Pending 0 5m28s claudie kube-eleven-6c449847c-7stz6 0/1 Pending 0 5m28s claudie kuber-757f496d76-rxbqv 0/1 Pending 0 5m27s claudie manager-7c86c5dff6-wrwsv 0/1 Running 2 (26m ago) 27m claudie minio-0 0/1 Pending 0 5m13s claudie minio-1 0/1 Pending 0 5m13s claudie minio-2 1/1 Running 0 27m claudie minio-3 1/1 Running 0 17m claudie mongodb-b79df96d5-p7nnb 0/1 Pending 0 5m30s claudie terraformer-5767b65455-nhchr 0/1 Running 0 27m kube-system cilium-jpskg 1/1 Running 0 58s kube-system cilium-kq9h4 1/1 Running 0 57s kube-system cilium-operator-555d4c4d76-thnpv 1/1 Running 0 88m kube-system coredns-76c4f7868f-knqtc 1/1 Running 0 92s kube-system coredns-76c4f7868f-w7fp9 1/1 Running 0 92s kube-system etcd-gcp-ctrl-nodes-m6kpu0e-01 1/1 Running 0 90m kube-system hubble-relay-d65ffb68f-xhpvp 1/1 Running 0 5m29s kube-system hubble-ui-86f6cd444-96pl7 2/2 Running 0 5m27s kube-system kube-apiserver-gcp-ctrl-nodes-m6kpu0e-01 1/1 Running 0 90m kube-system kube-controller-manager-gcp-ctrl-nodes-m6kpu0e-01 1/1 Running 0 89m kube-system kube-proxy-5vqg9 1/1 Running 0 88m kube-system kube-proxy-cxkxj 1/1 Running 0 89m kube-system kube-scheduler-gcp-ctrl-nodes-m6kpu0e-01 1/1 Running 0 90m kube-system metrics-server-b65cdc569-dt5mr 1/1 Running 0 88m longhorn-system csi-attacher-6c4495498-xcqdz 1/1 Running 0 5m27s longhorn-system csi-attacher-6c4495498-xpwr9 1/1 Running 0 85m longhorn-system csi-attacher-6c4495498-zpcrk 1/1 Running 0 5m26s longhorn-system csi-provisioner-7d8cf4f58f-d6l55 1/1 Running 0 5m27s longhorn-system csi-provisioner-7d8cf4f58f-kkz9v 1/1 Running 0 5m27s longhorn-system csi-provisioner-7d8cf4f58f-wvv74 1/1 Running 0 85m longhorn-system csi-resizer-77b968dfcd-4trb2 1/1 Running 0 5m30s longhorn-system csi-resizer-77b968dfcd-t6g2t 1/1 Running 0 5m30s longhorn-system csi-resizer-77b968dfcd-wmp4q 1/1 Running 0 85m longhorn-system csi-snapshotter-77699d78fb-4njkk 1/1 Running 0 5m30s longhorn-system csi-snapshotter-77699d78fb-4zzvm 1/1 Running 0 5m30s longhorn-system csi-snapshotter-77699d78fb-jzrmm 1/1 Running 0 85m longhorn-system engine-image-ei-04c05bf8-qgq6x 1/1 Running 0 86m longhorn-system instance-manager-be12e3a749d94492623a35549fdfbf8b 1/1 Running 0 85m longhorn-system longhorn-csi-plugin-57d6n 3/3 Running 0 85m longhorn-system longhorn-driver-deployer-55b7b5c7b4-wckhj 1/1 Running 2 (86m ago) 86m longhorn-system longhorn-manager-jx4q7 2/2 Running 1 (86m ago) 86m longhorn-system longhorn-ui-786c6ff-jqkqn 1/1 Running 0 5m30s longhorn-system longhorn-ui-786c6ff-tcql6 1/1 Running 0 86m
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: {{ .StorageClassName }} labels: claudie.io/storage-class: {{ .StorageClassName }} provisioner: driver.longhorn.io parameters: fromBackup: "" nodeSelector: {{ .ZoneName }} fsType: xfs numberOfReplicas: "3" staleReplicaTimeout: "28800" reclaimPolicy: Delete allowVolumeExpansion: true volumeBindingMode: Immediate