Closed ctromans closed 1 week ago
Sounds like the dqlite store is having a hard time keeping up with all the leases and leader elections of the controller pods. Would you mind sharing an inspection report from the cluster? I am particularly keen to see what the status of the dqlite cluster is. cc @MathieuBordere
Hi @neoaggelos, I'm afraid I've given up on that install as I'm up against a deadline and it was completely unresponsive. I have however rebuilt the cluster without HA mode (two workers, one master) in a new setup. I'm still getting the leadership election problems:
E0119 12:22:38.255418 1 leaderelection.go:367] Failed to update lock: Put "https://10.152.183.1:443/api/v1/namespaces/openebs/endpoints/openebs.io-local": context deadline exceeded I0119 12:22:38.255990 1 leaderelection.go:283] failed to renew lease openebs/openebs.io-local: timed out waiting for the condition F0119 12:22:38.256029 1 controller.go:888] leaderelection lost goroutine 1 [running]: k8s.io/klog/v2.stacks(0xc0000a4e01, 0xc000976000, 0x45, 0x105) /go/pkg/mod/k8s.io/klog/v2@v2.40.1/klog.go:1140 +0xb9 [stack trace]
This causes cycling in CrashLoopBackOff.
I've attached an inspection report from this setup. Let me know if you'd rather this was opened in a separate issue, as it is a fresh deployment from the OP. inspection-report-20230119_122350.tar.gz
Hi @ctromans, we (if I may speak for the microk8s team) would be very interested in replicating your setup to reproduce your problem. Do you have a script / steps we can follow to replicate your cluster?
Sorry for being slow getting back to you @MathieuBordere, I've been running various tests to try and get to the bottom of this. The errors detailed have been occurring when micro8s was installed on spinning storage in a software RAID configuration:
/dev/md125:
Container : /dev/md/imsm0, member 0
Raid Level : raid5
Array Size : 209715200 (200.00 GiB 214.75 GB)
Used Dev Size : 104857600 (100.00 GiB 107.37 GB)
Raid Devices : 3
Total Devices : 3
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Layout : left-asymmetric
Chunk Size : 64K
Consistency Policy : resync
UUID : 74987479:61cdb5f6:e3115e63:bbcef6f7
Number Major Minor RaidDevice State
2 8 0 0 active sync /dev/sda
1 8 16 1 active sync /dev/sdb
0 8 32 2 active sync /dev/sdc
Comprising of identical disks:
hdparm -i /dev/sda
/dev/sda:
Model=ST2000DM008-2FR102, FwRev=0001, SerialNo=WFL5EKLS
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=0
BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=3907029168
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
AdvancedPM=yes: unknown setting WriteCache=enabled
Drive conforms to: unknown: ATA/ATAPI-4,5,6,7
However moving from this storage, to NVMe:
*-nvme
description: NVMe device
product: INTEL SSDPELKX010T8
vendor: Intel Corporation
physical id: 0
bus info: pci@0000:07:00.0
logical name: /dev/nvme0
version: VCV10352
serial: PHLJ121000JZ1P0I
width: 64 bits
clock: 33MHz
capabilities: nvme nvm_express bus_master cap_list rom
configuration: driver=nvme latency=0 nqn=nqn.2014.08.org.nvmexpress:80868086PHLJ121000JZ1P0I INTEL SSDPELKX010T8 state=live
The problem just disappeared. I can only assume the storage speed was insufficient to support the length of time-out, and frequent race conditions were being lost.
For those watching/maintaining, trying to install deployKF can sometimes trigger this behavior, probably because deployKF makes a LOT of Kubernetes API calls during the first install.
Read more in my write-up here: https://github.com/deployKF/deployKF/issues/39#issuecomment-1741646935
Personally, I think microk8s
is currently not safe to use because it has the possibility of getting into this state, so I recommend k3s
instead.
Hi @thesuperzapper, Thank you for providing a way for us to trigger this bug.
I tried to reproduce the issue with deployKF
by running the following steps multiple times:
multipass launch --mem 7G --cpus 2 --name mk8s-vm charm-dev
multipass shell mk8s-vm
sudo snap refresh microk8s —channel=1.27/stable
sudo microk8s start
sudo microk8s status —wait-ready
microk8s enable dns microk8s enable hostpath-storage microk8s enable metallb:10.64.140.43-10.64.140.49
5. Installed `deployKf` following this [tutorial](https://www.deploykf.org/guides/local-quickstart/)
Which basically means, checking out the project and executing:
bash ./deploykf/argocd-plugin/install_argocd.sh bash ~/deploykf/scripts/sync_argocd_apps.sh
However, I was not able to reproduce the issue - all pods came up after a while.
Could you share your setup/specs along with any additional steps that you have done to trigger this? I assume we may face this problem more often in a resource-constrained environment, so this would be interesting to know.
Thanks
@bschimke95 this is my problem too, I was only able to reproduce the issue on FIRST the time I used microk8s. As I was saying to the user in https://github.com/deployKF/deployKF/issues/39#issuecomment-1741646935, after a clean install of microk8s I could not get it to happen again.
For reference, this was on my Ubuntu 22.04 server, which is not exactly "resource-constrained":
One thing I may have done differently the first time was not setting a specific version of microk8s to install, e.g. something like sudo snap install microk8s --classic
.
same issure, hostpath-storage and dashboard pod always restarted, version :1.28/stable
ryze@ubuntu:~$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cicd gitlab-7f9d865fcb-w7qv4 1/1 Running 1 (22h ago) 23h
cicd postgres-statefulset-0 1/1 Running 2 (22h ago) 4d4h
kube-system dashboard-metrics-scraper-5657497c4c-rt27h 1/1 Running 2 (22h ago) 4d4h
cicd sonarqube-5c9d49b6fd-cd4lr 1/1 Running 2 (22h ago) 26h
kube-system kubernetes-dashboard-54b48fbf9-w4x6s 1/1 Running 7 (80m ago) 4d4h
cicd jenkins-65579d4487-qnkss 1/1 Running 0 85m
cicd jenkins-8fb9f5799-m8r2q 1/1 Running 0 15m
kube-system calico-node-m5fcb 1/1 Running 2 (22h ago) 4d5h
kube-system coredns-864597b5fd-mrlqc 1/1 Running 2 (22h ago) 4d5h
kube-system calico-kube-controllers-77bd7c5b-cwflm 0/1 Running 4 (3m15s ago) 4d5h
kube-system metrics-server-848968bdcd-w8vkd 0/1 Running 25 (3m15s ago) 4d4h
kube-system hostpath-provisioner-7df77bc496-js5zf 1/1 Running 45 (3m58s ago) 4d4h
ryze@ubuntu:~$ kubectl logs -f metrics-server-848968bdcd-w8vkd -n kube-system -p
Error from server (NotFound): the server could not find the requested resource ( pods/log metrics-server-848968bdcd-w8vkd)
ryze@ubuntu:~$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cicd gitlab-7f9d865fcb-w7qv4 1/1 Running 1 (22h ago) 24h
cicd postgres-statefulset-0 1/1 Running 2 (22h ago) 4d4h
kube-system dashboard-metrics-scraper-5657497c4c-rt27h 1/1 Running 2 (22h ago) 4d5h
cicd sonarqube-5c9d49b6fd-cd4lr 1/1 Running 2 (22h ago) 26h
kube-system hostpath-provisioner-7df77bc496-js5zf 1/1 Running 46 (7m55s ago) 4d5h
kube-system coredns-864597b5fd-mrlqc 1/1 Running 2 (22h ago) 4d5h
cicd jenkins-65579d4487-qnkss 1/1 Running 0 91m
cicd jenkins-8fb9f5799-m8r2q 1/1 Running 0 21m
kube-system calico-kube-controllers-77bd7c5b-cwflm 1/1 Running 4 (9m20s ago) 4d5h
kube-system calico-node-m5fcb 1/1 Running 2 (22h ago) 4d5h
kube-system metrics-server-848968bdcd-w8vkd 1/1 Running 25 (9m20s ago) 4d5h
kube-system kubernetes-dashboard-54b48fbf9-w4x6s 1/1 Running 8 (61s ago) 4d5h
And hostpath-provisioner logs
I1128 08:18:28.758003 1 hostpath-provisioner.go:97] Using busybox image: "busybox:1.28.4"
I1128 08:18:28.901804 1 leaderelection.go:248] attempting to acquire leader lease kube-system/microk8s.io-hostpath...
I1128 08:19:01.716190 1 leaderelection.go:258] successfully acquired lease kube-system/microk8s.io-hostpath
I1128 08:19:01.734941 1 event.go:285] Event(v1.ObjectReference{Kind:"Endpoints", Namespace:"kube-system", Name:"microk8s.io-hostpath", UID:"c78c2716-8ea7-4a14-af83-5e53b7aa866a", APIVersion:"v1", ResourceVersion:"492052", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' hostpath-provisioner-7df77bc496-js5zf_0f88d79c-fec8-4b74-bef5-b5f107295c29 became leader
I1128 08:19:01.771083 1 controller.go:810] Starting provisioner controller microk8s.io/hostpath_hostpath-provisioner-7df77bc496-js5zf_0f88d79c-fec8-4b74-bef5-b5f107295c29!
I1128 08:19:02.974741 1 controller.go:859] Started provisioner controller microk8s.io/hostpath_hostpath-provisioner-7df77bc496-js5zf_0f88d79c-fec8-4b74-bef5-b5f107295c29!
E1128 08:19:48.463858 1 leaderelection.go:367] Failed to update lock: Put "https://10.152.183.1:443/api/v1/namespaces/kube-system/endpoints/microk8s.io-hostpath": context deadline exceeded
I1128 08:19:48.528501 1 leaderelection.go:283] failed to renew lease kube-system/microk8s.io-hostpath: timed out waiting for the condition
F1128 08:19:48.589396 1 controller.go:888] leaderelection lost
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0x1)
/go/pkg/mod/k8s.io/klog/v2@v2.40.1/klog.go:1140 +0x8a
k8s.io/klog/v2.(*loggingT).output(0x2601ca0, 0x3, 0x0, 0xc000156000, 0x0, {0x1d7e668, 0x0}, 0xc000132240, 0x0)
/go/pkg/mod/k8s.io/klog/v2@v2.40.1/klog.go:1088 +0x66f
k8s.io/klog/v2.(*loggingT).printf(0x0, 0x0, 0x0, {0x0, 0x0}, {0x17c5011, 0x13}, {0x0, 0x0, 0x0})
/go/pkg/mod/k8s.io/klog/v2@v2.40.1/klog.go:753 +0x1c5
k8s.io/klog/v2.Fatalf(...)
/go/pkg/mod/k8s.io/klog/v2@v2.40.1/klog.go:1642
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func2()
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:888 +0x55
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run.func1()
/go/pkg/mod/k8s.io/client-go@v0.23.1/tools/leaderelection/leaderelection.go:203 +0x1f
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc000464120, {0x19c8868, 0xc00011e000})
/go/pkg/mod/k8s.io/client-go@v0.23.1/tools/leaderelection/leaderelection.go:213 +0x189
k8s.io/client-go/tools/leaderelection.RunOrDie({0x19c8868, 0xc00011e000}, {{0x19e5b48, 0xc0004423c0}, 0x37e11d600, 0x2540be400, 0x77359400, {0xc000413900, 0x1870e58, 0x0}, ...})
/go/pkg/mod/k8s.io/client-go@v0.23.1/tools/leaderelection/leaderelection.go:226 +0x94
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run(0xc000415680, {0x19c8868, 0xc00011e000})
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:880 +0x408
main.main()
/src/hostpath-provisioner.go:333 +0x305
goroutine 34 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x0)
/go/pkg/mod/k8s.io/klog/v2@v2.40.1/klog.go:1283 +0x6a
created by k8s.io/klog/v2.init.0
/go/pkg/mod/k8s.io/klog/v2@v2.40.1/klog.go:420 +0xfb
goroutine 36 [chan receive]:
k8s.io/apimachinery/pkg/watch.(*Broadcaster).loop(0xc00040f240)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/watch/mux.go:247 +0x49
created by k8s.io/apimachinery/pkg/watch.NewLongQueueBroadcaster
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/watch/mux.go:89 +0x11b
goroutine 37 [runnable]:
fmt.getField({0x16fb2e0, 0xc0001563f0, 0x99}, 0x0)
/usr/local/go/src/fmt/print.go:290 +0x6e
fmt.(*pp).printValue(0xc0001da0d0, {0x16fb2e0, 0xc0001563f0, 0x7f1690f90f18}, 0x76, 0x0)
/usr/local/go/src/fmt/print.go:806 +0x1225
fmt.(*pp).printArg(0xc0001da0d0, {0x16fb2e0, 0xc0001563f0}, 0x76)
/usr/local/go/src/fmt/print.go:712 +0x74c
fmt.(*pp).doPrintf(0xc0001da0d0, {0x17db67e, 0x26}, {0xc00034e640, 0x34e640, 0x11d})
/usr/local/go/src/fmt/print.go:1026 +0x288
fmt.Fprintf({0x199e060, 0xc000156460}, {0x17db67e, 0x26}, {0xc00034e640, 0x4, 0x4})
/usr/local/go/src/fmt/print.go:204 +0x75
k8s.io/klog/v2.(*loggingT).printf(0x0, 0x40a72b, 0x0, {0x0, 0x0}, {0x17db67e, 0x26}, {0xc00034e640, 0x4, 0x4})
/go/pkg/mod/k8s.io/klog/v2@v2.40.1/klog.go:749 +0x145
k8s.io/klog/v2.Infof({0x17db67e, 0x5a}, {0xc00034e640, 0x4060d8, 0xc0003a5b60})
/go/pkg/mod/k8s.io/klog/v2@v2.40.1/klog.go:1536 +0x5e
k8s.io/client-go/tools/record.(*eventBroadcasterImpl).StartLogging.func1(0xc0001ed400)
/go/pkg/mod/k8s.io/client-go@v0.23.1/tools/record/event.go:285 +0x156
k8s.io/client-go/tools/record.(*eventBroadcasterImpl).StartEventWatcher.func1()
/go/pkg/mod/k8s.io/client-go@v0.23.1/tools/record/event.go:311 +0xc5
created by k8s.io/client-go/tools/record.(*eventBroadcasterImpl).StartEventWatcher
/go/pkg/mod/k8s.io/client-go@v0.23.1/tools/record/event.go:302 +0x91
goroutine 38 [runnable]:
encoding/json.isEmptyValue({0x150fe40, 0xc0001eda78, 0x198})
/usr/local/go/src/encoding/json/encode.go:341 +0x23a
encoding/json.structEncoder.encode({{{0xc000205680, 0x18, 0x150fe40}, 0xc00048acf0}}, 0xc0003c6500, {0x16fb2e0, 0xc0001eda18, 0x11}, {0x0, 0x1})
/usr/local/go/src/encoding/json/encode.go:750 +0x105
encoding/json.structEncoder.encode({{{0xc0002ee000, 0x403e54, 0x40}, 0xc0004635c0}}, 0xc0003c6500, {0x1765c40, 0xc0001ed900, 0x0}, {0x0, 0x1})
/usr/local/go/src/encoding/json/encode.go:761 +0x1f4
encoding/json.ptrEncoder.encode({0x203000}, 0xc0003c6500, {0x1795ee0, 0xc0001ed900, 0x1795ee0}, {0xe0, 0x56})
/usr/local/go/src/encoding/json/encode.go:945 +0x27e
encoding/json.(*encodeState).reflectValue(0xc000147770, {0x1795ee0, 0xc0001ed900, 0x2}, {0xe0, 0x9e})
/usr/local/go/src/encoding/json/encode.go:360 +0x78
encoding/json.(*encodeState).marshal(0x0, {0x1795ee0, 0xc0001ed900}, {0xbd, 0x7e})
/usr/local/go/src/encoding/json/encode.go:332 +0xfa
encoding/json.(*Encoder).Encode(0xc000147860, {0x1795ee0, 0xc0001ed900})
/usr/local/go/src/encoding/json/stream.go:206 +0x7e
k8s.io/apimachinery/pkg/runtime/serializer/json.(*Serializer).doEncode(0x17b76f1, {0x19b4198, 0xc0001ed900}, {0x199bd60, 0xc0004f6360})
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/runtime/serializer/json/json.go:233 +0x19a
k8s.io/apimachinery/pkg/runtime/serializer/json.(*Serializer).Encode(0xc000108c30, {0x19b4198, 0xc0001ed900}, {0x199bd60, 0xc0004f6360})
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/runtime/serializer/json/json.go:207 +0xfc
k8s.io/apimachinery/pkg/runtime.WithVersionEncoder.Encode({{0x19b8f68, 0xc0004384e0}, {0x19b5d18, 0xc000108c30}, {0x19b5c78, 0xc0002490a0}}, {0x19b4198, 0xc0001ed900}, {0x199bd60, 0xc0004f6360})
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/runtime/helper.go:240 +0x2ee
k8s.io/apimachinery/pkg/runtime.Encode({0x19b8ef0, 0xc0004f6330}, {0x19b4198, 0xc0001ed900})
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/runtime/codec.go:50 +0x64
k8s.io/client-go/rest.(*Request).Body(0xc0004d7200, {0x1795ee0, 0xc0001ed900})
/go/pkg/mod/k8s.io/client-go@v0.23.1/rest/request.go:450 +0x46b
k8s.io/client-go/kubernetes/typed/core/v1.(*events).CreateWithEventNamespace(0xc000438ec0, 0xc0001ed900)
/go/pkg/mod/k8s.io/client-go@v0.23.1/kubernetes/typed/core/v1/event_expansion.go:58 +0xf9
k8s.io/client-go/kubernetes/typed/core/v1.(*EventSinkImpl).Create(0xc000413780, 0x10080)
/go/pkg/mod/k8s.io/client-go@v0.23.1/kubernetes/typed/core/v1/event_expansion.go:160 +0x22
k8s.io/client-go/tools/record.recordEvent({0x19c4680, 0xc0004137e0}, 0xc0001ed900, {0x0, 0x0, 0x100000000000000}, 0x0, 0xc000426360)
/go/pkg/mod/k8s.io/client-go@v0.23.1/tools/record/event.go:248 +0x16a
k8s.io/client-go/tools/record.recordToSink({0x19c4680, 0xc0004137e0}, 0xc0001ed400, 0xc000147fa8, 0x2540be400)
/go/pkg/mod/k8s.io/client-go@v0.23.1/tools/record/event.go:216 +0x12f
k8s.io/client-go/tools/record.(*eventBroadcasterImpl).StartRecordingToSink.func1(0x0)
/go/pkg/mod/k8s.io/client-go@v0.23.1/tools/record/event.go:194 +0x33
k8s.io/client-go/tools/record.(*eventBroadcasterImpl).StartEventWatcher.func1()
/go/pkg/mod/k8s.io/client-go@v0.23.1/tools/record/event.go:311 +0xc5
created by k8s.io/client-go/tools/record.(*eventBroadcasterImpl).StartEventWatcher
/go/pkg/mod/k8s.io/client-go@v0.23.1/tools/record/event.go:302 +0x91
goroutine 39 [select]:
k8s.io/client-go/util/workqueue.(*delayingType).waitingLoop(0xc0003a5da0)
/go/pkg/mod/k8s.io/client-go@v0.23.1/util/workqueue/delaying_queue.go:231 +0x34e
created by k8s.io/client-go/util/workqueue.newDelayingQueue
/go/pkg/mod/k8s.io/client-go@v0.23.1/util/workqueue/delaying_queue.go:68 +0x247
goroutine 40 [select]:
k8s.io/client-go/util/workqueue.(*delayingType).waitingLoop(0xc0003a5f20)
/go/pkg/mod/k8s.io/client-go@v0.23.1/util/workqueue/delaying_queue.go:231 +0x34e
created by k8s.io/client-go/util/workqueue.newDelayingQueue
/go/pkg/mod/k8s.io/client-go@v0.23.1/util/workqueue/delaying_queue.go:68 +0x247
goroutine 47 [IO wait]:
internal/poll.runtime_pollWait(0x7f166a2e3458, 0x72)
/usr/local/go/src/runtime/netpoll.go:303 +0x85
internal/poll.(*pollDesc).wait(0xc0003c6c80, 0xc0000ca000, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc0003c6c80, {0xc0000ca000, 0x2863, 0x2863})
/usr/local/go/src/internal/poll/fd_unix.go:167 +0x25a
net.(*netFD).Read(0xc0003c6c80, {0xc0000ca000, 0xc0000ca05f, 0x310})
/usr/local/go/src/net/fd_posix.go:56 +0x29
net.(*conn).Read(0xc00035d248, {0xc0000ca000, 0x416f66, 0xc00010d7f0})
/usr/local/go/src/net/net.go:183 +0x45
crypto/tls.(*atLeastReader).Read(0xc0003498c0, {0xc0000ca000, 0x0, 0x409bcd})
/usr/local/go/src/crypto/tls/conn.go:778 +0x3d
bytes.(*Buffer).ReadFrom(0xc0002ed778, {0x199bea0, 0xc0003498c0})
/usr/local/go/src/bytes/buffer.go:204 +0x98
crypto/tls.(*Conn).readFromUntil(0xc0002ed500, {0x199e160, 0xc00035d248}, 0x2809)
/usr/local/go/src/crypto/tls/conn.go:800 +0xe5
crypto/tls.(*Conn).readRecordOrCCS(0xc0002ed500, 0x0)
/usr/local/go/src/crypto/tls/conn.go:607 +0x112
crypto/tls.(*Conn).readRecord(...)
/usr/local/go/src/crypto/tls/conn.go:575
crypto/tls.(*Conn).Read(0xc0002ed500, {0xc000594000, 0x1000, 0x8633e0})
/usr/local/go/src/crypto/tls/conn.go:1278 +0x16f
bufio.(*Reader).Read(0xc0005928a0, {0xc0001be820, 0x9, 0x871382})
/usr/local/go/src/bufio/bufio.go:227 +0x1b4
io.ReadAtLeast({0x199bd00, 0xc0005928a0}, {0xc0001be820, 0x9, 0x9}, 0x9)
/usr/local/go/src/io/io.go:328 +0x9a
io.ReadFull(...)
/usr/local/go/src/io/io.go:347
golang.org/x/net/http2.readFrameHeader({0xc0001be820, 0x9, 0xc001013fb0}, {0x199bd00, 0xc0005928a0})
/go/pkg/mod/golang.org/x/net@v0.0.0-20220114011407-0dd24b26b47d/http2/frame.go:237 +0x6e
golang.org/x/net/http2.(*Framer).ReadFrame(0xc0001be7e0)
/go/pkg/mod/golang.org/x/net@v0.0.0-20220114011407-0dd24b26b47d/http2/frame.go:498 +0x95
golang.org/x/net/http2.(*clientConnReadLoop).run(0xc00010df98)
/go/pkg/mod/golang.org/x/net@v0.0.0-20220114011407-0dd24b26b47d/http2/transport.go:2101 +0x130
golang.org/x/net/http2.(*ClientConn).readLoop(0xc00046e480)
/go/pkg/mod/golang.org/x/net@v0.0.0-20220114011407-0dd24b26b47d/http2/transport.go:1997 +0x6f
created by golang.org/x/net/http2.(*Transport).newClientConn
/go/pkg/mod/golang.org/x/net@v0.0.0-20220114011407-0dd24b26b47d/http2/transport.go:725 +0xac5
goroutine 86 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc00040f2d0, 0x1a)
/usr/local/go/src/runtime/sema.go:513 +0x13d
sync.(*Cond).Wait(0x0)
/usr/local/go/src/sync/cond.go:56 +0x8c
k8s.io/client-go/util/workqueue.(*Type).Get(0xc0003a5c80)
/go/pkg/mod/k8s.io/client-go@v0.23.1/util/workqueue/queue.go:157 +0x9e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).processNextClaimWorkItem(0xc000415680, {0x19c8830, 0xc000132240})
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:910 +0x3e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).runClaimWorker(...)
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:899
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1.2()
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:855 +0x45
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f166a284e48)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x4060d8, {0x199d9e0, 0xc000463770}, 0x1, 0xc0000a2600)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0003a5b60, 0x3b9aca00, 0x0, 0x0, 0x43dfc5)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x10e39c0, 0xc000045190, 0xc0001bb7b8)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:90 +0x25
created by sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:855 +0xbaf
goroutine 21 [select (no cases)]:
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1({0x19c8830, 0xc000132240})
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:861 +0xd4f
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
/go/pkg/mod/k8s.io/client-go@v0.23.1/tools/leaderelection/leaderelection.go:211 +0x154
goroutine 78 [runnable]:
fmt.(*pp).doPrintf(0xc00011a9c0, {0x17f4dc5, 0x33}, {0xc000520f10, 0xc000520e38, 0x87a2ac})
/usr/local/go/src/fmt/print.go:983 +0xd37
fmt.Errorf({0x17f4dc5, 0x33}, {0xc000520f10, 0x1, 0x1})
/usr/local/go/src/fmt/errors.go:20 +0x6c
k8s.io/apimachinery/pkg/watch.(*StreamWatcher).receive(0xc00034e380)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/watch/streamwatcher.go:120 +0x31b
created by k8s.io/apimachinery/pkg/watch.NewStreamWatcher
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/watch/streamwatcher.go:76 +0x135
goroutine 87 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc00040f310, 0x17)
/usr/local/go/src/runtime/sema.go:513 +0x13d
sync.(*Cond).Wait(0x150fe40)
/usr/local/go/src/sync/cond.go:56 +0x8c
k8s.io/client-go/util/workqueue.(*Type).Get(0xc0003a5e00)
/go/pkg/mod/k8s.io/client-go@v0.23.1/util/workqueue/queue.go:157 +0x9e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).processNextVolumeWorkItem(0xc000415680, {0x19c8830, 0xc000132240})
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:965 +0x3e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).runVolumeWorker(...)
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:904
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1.3()
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:856 +0x45
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f1690f9b748)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x199d9e0, 0xc00048bb00}, 0x1, 0xc0000a2600)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000057fb0, 0x3b9aca00, 0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x868fca, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:90 +0x25
created by sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:856 +0xa78
goroutine 88 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc00040f2d0, 0x18)
/usr/local/go/src/runtime/sema.go:513 +0x13d
sync.(*Cond).Wait(0x0)
/usr/local/go/src/sync/cond.go:56 +0x8c
k8s.io/client-go/util/workqueue.(*Type).Get(0xc0003a5c80)
/go/pkg/mod/k8s.io/client-go@v0.23.1/util/workqueue/queue.go:157 +0x9e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).processNextClaimWorkItem(0xc000415680, {0x19c8830, 0xc000132240})
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:910 +0x3e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).runClaimWorker(...)
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:899
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1.2()
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:855 +0x45
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f1690f9b748)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x199d9e0, 0xc00048bad0}, 0x1, 0xc0000a2600)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:90 +0x25
created by sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:855 +0xbaf
goroutine 89 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc00040f310, 0x18)
/usr/local/go/src/runtime/sema.go:513 +0x13d
sync.(*Cond).Wait(0x150fe40)
/usr/local/go/src/sync/cond.go:56 +0x8c
k8s.io/client-go/util/workqueue.(*Type).Get(0xc0003a5e00)
/go/pkg/mod/k8s.io/client-go@v0.23.1/util/workqueue/queue.go:157 +0x9e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).processNextVolumeWorkItem(0xc000415680, {0x19c8830, 0xc000132240})
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:965 +0x3e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).runVolumeWorker(...)
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:904
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1.3()
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:856 +0x45
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f166a302e30)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x199d9e0, 0xc0002114d0}, 0x1, 0xc0000a2600)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:90 +0x25
created by sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:856 +0xa78
goroutine 90 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc00040f2d0, 0x19)
/usr/local/go/src/runtime/sema.go:513 +0x13d
sync.(*Cond).Wait(0x0)
/usr/local/go/src/sync/cond.go:56 +0x8c
k8s.io/client-go/util/workqueue.(*Type).Get(0xc0003a5c80)
/go/pkg/mod/k8s.io/client-go@v0.23.1/util/workqueue/queue.go:157 +0x9e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).processNextClaimWorkItem(0xc000415680, {0x19c8830, 0xc000132240})
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:910 +0x3e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).runClaimWorker(...)
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:899
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1.2()
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:855 +0x45
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f166a302e30)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x199d9e0, 0xc000211440}, 0x1, 0xc0000a2600)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:90 +0x25
created by sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:855 +0xbaf
goroutine 91 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc00040f310, 0x15)
/usr/local/go/src/runtime/sema.go:513 +0x13d
sync.(*Cond).Wait(0x150fe40)
/usr/local/go/src/sync/cond.go:56 +0x8c
k8s.io/client-go/util/workqueue.(*Type).Get(0xc0003a5e00)
/go/pkg/mod/k8s.io/client-go@v0.23.1/util/workqueue/queue.go:157 +0x9e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).processNextVolumeWorkItem(0xc000415680, {0x19c8830, 0xc000132240})
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:965 +0x3e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).runVolumeWorker(...)
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:904
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1.3()
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:856 +0x45
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f166a284e48)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x199d9e0, 0xc0004637a0}, 0x1, 0xc0000a2600)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:90 +0x25
created by sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:856 +0xa78
goroutine 92 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc00040f2d0, 0x1b)
/usr/local/go/src/runtime/sema.go:513 +0x13d
sync.(*Cond).Wait(0x0)
/usr/local/go/src/sync/cond.go:56 +0x8c
k8s.io/client-go/util/workqueue.(*Type).Get(0xc0003a5c80)
/go/pkg/mod/k8s.io/client-go@v0.23.1/util/workqueue/queue.go:157 +0x9e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).processNextClaimWorkItem(0xc000415680, {0x19c8830, 0xc000132240})
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:910 +0x3e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).runClaimWorker(...)
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:899
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1.2()
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:855 +0x45
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f166a284e48)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x199d9e0, 0xc0004637d0}, 0x1, 0xc0000a2600)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:90 +0x25
created by sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:855 +0xbaf
goroutine 93 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc00040f310, 0x16)
/usr/local/go/src/runtime/sema.go:513 +0x13d
sync.(*Cond).Wait(0x150fe40)
/usr/local/go/src/sync/cond.go:56 +0x8c
k8s.io/client-go/util/workqueue.(*Type).Get(0xc0003a5e00)
/go/pkg/mod/k8s.io/client-go@v0.23.1/util/workqueue/queue.go:157 +0x9e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).processNextVolumeWorkItem(0xc000415680, {0x19c8830, 0xc000132240})
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:965 +0x3e
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).runVolumeWorker(...)
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:904
sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1.3()
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:856 +0x45
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f1690f9c738)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:155 +0x67
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x199d9e0, 0xc000013a10}, 0x1, 0xc0000a2600)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/apimachinery@v0.23.1/pkg/util/wait/wait.go:90 +0x25
created by sigs.k8s.io/sig-storage-lib-external-provisioner/v7/controller.(*ProvisionController).Run.func1
/go/pkg/mod/sigs.k8s.io/sig-storage-lib-external-provisioner/v7@v7.0.1/controller/controller.go:856 +0xa78
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This is still an issue in microk8s v1.31.1:
E1024 05:38:17.882718 1 leaderelection.go:364] Failed to update lock: Put "https://10.152.183.1:443/apis/coordination.k8s.io/v1/namespaces/gpu-operator-resources/leases/53822513.nvidia.com": context deadline exceeded
I1024 05:38:17.882748 1 leaderelection.go:280] failed to renew lease gpu-operator-resources/53822513.nvidia.com: timed out waiting for the condition
{"level":"error","ts":1729748297.8827953,"logger":"setup","msg":"problem running manager","error":"leader election lost"}
{"level":"info","ts":1729748297.8828146,"msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":1729748297.8828304,"msg":"Stopping and waiting for leader election runnables"}
error: unexpected EOF
root@openbench:/home/anders# microk8s version
MicroK8s v1.31.1 revision 7229
This happens at a random interval at about 2 to 20 min on a 1 node cluster with medium/high load(~50%) on CPU/GPU and DiskIO <100M/s read and write on a 3G/s NVME.
This cause microk8s to fail and die each time:
okt 24 07:54:44 openbench microk8s.daemon-kubelite[1456749]: E1024 07:54:44.473476 1456749 writers.go:135] "Unhandled Error" err="apiserver was unable to write a fallback JSON response: http: Handler timeout"
okt 24 07:54:44 openbench microk8s.daemon-kubelite[1456749]: E1024 07:54:44.474535 1456749 timeout.go:140] "Post-timeout activity" timeElapsed="3.182195ms" method="PATCH" path="/api/v1/nodes/openbench/status" result=null
okt 24 07:54:44 openbench microk8s.daemon-k8s-dqlite[956]: time="2024-10-24T07:54:44+02:00" level=error msg="error in txn: query (try: 0): context deadline exceeded"
okt 24 07:54:47 openbench microk8s.daemon-containerd[12605]: time="2024-10-24T07:54:47.330751343+02:00" level=error msg="Failed to handle backOff event &TaskExit{ContainerID:5ad55698d8e95f614585220f1aced1bd904e307032fa4439083da8ca3be45c78,ID:5ad55698d8e95f614585220f1aced1bd904e307032fa4439083da8ca3be45c78,Pid:1457514,ExitStatus:1,ExitedAt:2024-10-24 05:54:13.308608132 +0000 UTC,XXX_unrecognized:[],} for 5ad55698d8e95f614585220f1aced1bd904e307032fa4439083da8ca3be45c78" error="failed to handle container TaskExit event: failed to stop container: context deadline exceeded: unknown"
okt 24 07:54:48 openbench microk8s.daemon-kubelite[1456749]: E1024 07:54:48.230311 1456749 leaderelection.go:322] Failed to release lock: Put "https://127.0.0.1:16443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=15s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
okt 24 07:54:48 openbench microk8s.daemon-kubelite[1456749]: E1024 07:54:48.230333 1456749 writers.go:122] "Unhandled Error" err="apiserver was unable to write a JSON response: http: Handler timeout"
okt 24 07:54:48 openbench microk8s.daemon-kubelite[1456749]: E1024 07:54:48.230342 1456749 server.go:303] "Leaderelection lost"
okt 24 07:54:48 openbench microk8s.daemon-kubelite[1456749]: E1024 07:54:48.230372 1456749 finisher.go:175] "Unhandled Error" err="FinishRequest: post-timeout activity - time-elapsed: 3.78µs, panicked: false, err: context deadline exceeded, panic-reason: <nil>"
okt 24 07:54:48 openbench systemd[1]: snap.microk8s.daemon-kubelite.service: Main process exited, code=exited, status=1/FAILURE
okt 24 07:54:48 openbench systemd[1]: snap.microk8s.daemon-kubelite.service: Failed with result 'exit-code'.
okt 24 07:54:48 openbench systemd[1]: snap.microk8s.daemon-kubelite.service: Consumed 57.086s CPU time.
okt 24 07:54:48 openbench systemd[1]: snap.microk8s.daemon-kubelite.service: Scheduled restart job, restart counter is at 36.
okt 24 07:54:48 openbench systemd[1]: Stopped Service for snap application microk8s.daemon-kubelite.
okt 24 07:54:48 openbench systemd[1]: snap.microk8s.daemon-kubelite.service: Consumed 57.086s CPU time.
okt 24 07:54:48 openbench systemd[1]: Started Service for snap application microk8s.daemon-kubelite.
Summary
Many pods (particularly operators) are continually restarting after failing to conduct leader elections due to being unable to update a lock. For example, csi-nfs-controller and gpu-operator.
microk8s kubectl logs csi-nfs-controller-7bd5678cbc-8n2hl -n kube-system
E0117 11:11:11.212861 1 leaderelection.go:367] Failed to update lock: Put "https://10.152.183.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/nfs-csi-k8s-io": context deadline exceeded I0117 11:11:11.212898 1 leaderelection.go:283] failed to renew lease kube-system/nfs-csi-k8s-io: timed out waiting for the condition F0117 11:11:11.212909 1 leader_election.go:182] stopped leading
microk8s kubectl logs gpu-operator-567cf74d9d-vl648 -n gpu-operator-resources
E0117 11:17:15.507355 1 leaderelection.go:367] Failed to update lock: Put "https://10.152.183.1:443/api/v1/namespaces/gpu-operator-resources/configmaps/53822513.nvidia.com": context deadline exceeded I0117 11:17:15.507400 1 leaderelection.go:283] failed to renew lease gpu-operator-resources/53822513.nvidia.com: timed out waiting for the condition 1.673954235507425e+09 ERROR setup problem running manager {"error": "leader election lost"}
microk8s kubectl get services -A NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default kubernetes ClusterIP 10.152.183.1 443/TCP 2d22h
kube-system kube-dns ClusterIP 10.152.183.10 53/UDP,53/TCP,9153/TCP 2d22h
k8s-dqlite is also showing up as continual 100% CPU usage in top on the master node, and the logs contain:
microk8s.daemon-k8s-dqlite[740714]: time="2023-01-17T12:00:28Z" level=error msg="error in txn: query (try: 0): context canceled" microk8s.daemon-k8s-dqlite[740714]: time="2023-01-17T12:00:28Z" level=error msg="error in txn: query (try: 0): context canceled" microk8s.daemon-k8s-dqlite[740714]: time="2023-01-17T12:00:28Z" level=error msg="error in txn: query (try: 0): context canceled" microk8s.daemon-k8s-dqlite[740714]: time="2023-01-17T12:00:28Z" level=error msg="error in txn: query (try: 0): context deadline exceeded" microk8s.daemon-k8s-dqlite[740714]: time="2023-01-17T11:59:29Z" level=error msg="error while range on /registry/health : query (try: 0): context deadline exceeded" microk8s.daemon-k8s-dqlite[740714]: time="2023-01-17T11:58:59Z" level=error msg="error while range on /registry/health : query (try: 0): context canceled"
3 node, High Availability cluster, MicroK8s 1.26 on Ubuntu 22.04.
What Should Happen Instead?
I would expect it to be able to update the lock, and for pods dependant on this functionality not to keep crashing and restarting.