CharlesGillanders commented 3 years ago

I have a 3 node microk8s cluster on raspberry PI running 1.21/edge - at least once daily one of the nodes will go into a Not Ready status and when I restart with microk8s stop ; microk8s start it will hang just after trying to start the API server.

inspection-report-20210521_093118.tar.gz

inspection-report-20210521_093759.tar.gz

The attached inspection reports are from one such event today which has impacted two of the three nodes.

Any ideas where to look?

ktsakalozos commented 3 years ago

Hi @CharlesGillanders. In the logs I see that the datastore (dqlite) cluster is not healthy. First, there is a period where the cluster does not reply to queries.

May 21 04:26:40 pi-node3 microk8s.daemon-kubelite[559691]: E0521 04:26:40.612941  559691 event.go:264] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"heimdall-5b46974d59-z2hbb.1680fa62c0e393c7", GenerateName:"", Namespace:"heimdall", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"heimdall", Name:"heimdall-5b46974d59-z2hbb", UID:"1accd027-d0f6-4b39-aa14-a1f70b2cfe5e", APIVersion:"v1", ResourceVersion:"2128206", FieldPath:""}, Reason:"NodeNotReady", Message:"Node is not ready", Source:v1.EventSource{Component:"node-controller", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc021eb77f5d69dc7, ext:486531874504, loc:(*time.Location)(0x8aec560)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc021eb77f5d69dc7, ext:486531874504, loc:(*time.Location)(0x8aec560)}}, Count:1, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'rpc error: code = Unknown desc = exec (try: 500): database is locked' (will not retry!)
May 21 04:26:40 pi-node3 microk8s.daemon-kubelite[559691]: E0521 04:26:40.829712  559691 leaderelection.go:361] Failed to update lock: Put "https://127.0.0.1:16443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=30s": context deadline exceeded

This causes a restart and the dqlite cluster is not able to come up within 5 minutes so the restart process is triggered:

May 21 09:20:58 pi-node3 microk8s.daemon-kubelite[702459]: W0521 09:20:58.976958  702459 authentication.go:507] AnonymousAuth is not allowed with the AlwaysAllow authorizer. Resetting AnonymousAuth to false. You should use a different authorizer
May 21 09:25:58 pi-node3 systemd[1]: Stopping Service for snap application microk8s.daemon-kubelite...
May 21 09:26:07 pi-node3 microk8s.daemon-kubelite[702459]: Error: context deadline exceeded

Cat you share the output of the following on all three nodes:

cat  /var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml
ls -l  /var/snap/microk8s/current/var/kubernetes/backend/

FYI @MathieuBordere

CharlesGillanders commented 3 years ago

I shutdown all three nodes and checked the cluster status according to this post https://microk8s.io/docs/ha-recovery

Only one node was in a good state for the dqlite replication sequence numbers and the two nodes that were completely off both had a zero length cluster.yaml file.

I was able to follow the instructions to restore the dqlite files but the cluster still didn't come up cleanly.

I took a chance on copying the cluster.yaml file from the one good node to the two others and that worked.

The cluster.yaml file currently looks like this on all three nodes

Address: 192.168.57.30:19001 ID: 3297041220608546238 Role: 0
Address: 192.168.57.31:19001 ID: 4110902493829213503 Role: 0
Address: 192.168.57.32:19001 ID: 5908322102264992893 Role: 0

ktsakalozos commented 3 years ago

Could you attach a microk8s.inspect tarball of the working cluster. I am interested to know the workload you have and try to reproduce the problem. Also, could you please share a ls -l /var/snap/microk8s/current/var/kubernetes/backend/ as that would help us understand the size of the database.

CharlesGillanders commented 3 years ago

inspection-report-20210521_111445.tar.gz

I am seeing something unexpected in the ls -l (I don't know if it is important or not) two of the nodes have all of the files set to group microk8s but one node has some of them set to root. I know when I tried to recover the cluster I did manually set the group for all files to root to match the spec in the FAQ but it looks like files created after that point in time one one node at least are being created with a different group.

Here's output from the unusual node

-rw------- 1 root microk8s 8378168 May 21 10:40 0000000006514780-0000000006515091 -rw------- 1 root microk8s 8360600 May 21 10:41 0000000006515092-0000000006515615 -rw------- 1 root microk8s 8368520 May 21 10:44 0000000006515616-0000000006516021 -rw------- 1 root microk8s 8381912 May 21 10:46 0000000006516022-0000000006516442 -rw------- 1 root microk8s 8378240 May 21 10:48 0000000006516443-0000000006516869 -rw------- 1 root microk8s 8387288 May 21 10:51 0000000006516870-0000000006517308 -rw------- 1 root microk8s 8372768 May 21 10:51 0000000006517309-0000000006517887 -rw------- 1 root microk8s 8364488 May 21 10:54 0000000006517888-0000000006518294 -rw------- 1 root microk8s 8371472 May 21 10:56 0000000006518295-0000000006518684 -rw------- 1 root microk8s 8374280 May 21 11:00 0000000006518685-0000000006519113 -rw------- 1 root microk8s 8370896 May 21 11:04 0000000006519114-0000000006519438 -rw------- 1 root microk8s 8381552 May 21 11:07 0000000006519439-0000000006519854 -rw------- 1 root microk8s 8379248 May 21 11:09 0000000006519855-0000000006520295 -rw------- 1 root root 8386592 May 21 11:10 0000000006520296-0000000006520838 -rw------- 1 root root 8358152 May 21 11:11 0000000006520839-0000000006521214 -rw------- 1 root root 8353256 May 21 11:13 0000000006521215-0000000006521579 -rw------- 1 root root 8383136 May 21 11:13 0000000006521580-0000000006522017 -rw------- 1 root root 667592 May 21 11:15 0000000006522018-0000000006522055 -rw------- 1 root root 8384256 May 21 11:17 0000000006522056-0000000006522395 -rw------- 1 root root 8364560 May 21 11:18 0000000006522396-0000000006522746 -rw------- 1 root root 8366576 May 21 11:19 0000000006522747-0000000006523125 -rw-rw---- 1 root microk8s 1866 May 3 18:58 cluster.crt -rw-rw---- 1 root microk8s 3268 May 3 18:58 cluster.key -rw-rw---- 1 root microk8s 201 May 21 11:19 cluster.yaml -rw-rw-r-- 1 root microk8s 2 May 21 11:15 failure-domain -rw-rw---- 1 root microk8s 61 May 3 19:09 info.yaml srw------- 1 root root 0 May 21 11:15 kine.sock -rw-rw---- 1 root microk8s 67 May 21 11:15 localnode.yaml -rw-rw---- 1 root microk8s 32 May 21 11:15 metadata1 -rw-rw---- 1 root microk8s 32 May 21 10:59 metadata2 -rw------- 1 root root 8388608 May 21 11:19 open-4 -rw------- 1 root root 8388608 May 21 11:18 open-5 -rw------- 1 root root 8388608 May 21 11:19 open-6 -rw------- 1 root root 77753400 May 21 11:13 snapshot-256-6522037-580700424 -rw------- 1 root root 128 May 21 11:13 snapshot-256-6522037-580700424.meta -rw------- 1 root root 78305680 May 21 11:18 snapshot-256-6523061-580985737 -rw------- 1 root root 128 May 21 11:18 snapshot-256-6523061-580985737.meta

and the other

-rw-rw---- 1 root microk8s 8355056 May 21 10:39 0000000006514588-0000000006514977 -rw-rw---- 1 root microk8s 8363192 May 21 10:41 0000000006514978-0000000006515423 -rw-rw---- 1 root microk8s 8364848 May 21 10:43 0000000006515424-0000000006515892 -rw-rw---- 1 root microk8s 8386160 May 21 10:46 0000000006515893-0000000006516258 -rw-rw---- 1 root microk8s 8377880 May 21 10:47 0000000006516259-0000000006516737 -rw-rw---- 1 root microk8s 8376512 May 21 10:50 0000000006516738-0000000006517083 -rw-rw---- 1 root microk8s 8386520 May 21 10:51 0000000006517084-0000000006517682 -rw-rw---- 1 root microk8s 8387168 May 21 10:53 0000000006517683-0000000006518119 -rw-rw---- 1 root microk8s 8384072 May 21 10:55 0000000006518120-0000000006518513 -rw-rw---- 1 root microk8s 8365856 May 21 10:57 0000000006518514-0000000006518996 -rw-rw---- 1 root microk8s 8357432 May 21 11:02 0000000006518997-0000000006519305 -rw-rw---- 1 root microk8s 8384072 May 21 11:06 0000000006519306-0000000006519699 -rw-rw---- 1 root microk8s 8386448 May 21 11:08 0000000006519700-0000000006520126 -rw-rw---- 1 root microk8s 8380616 May 21 11:10 0000000006520127-0000000006520643 -rw-rw---- 1 root microk8s 8380616 May 21 11:11 0000000006520644-0000000006521103 -rw-rw---- 1 root microk8s 8384216 May 21 11:12 0000000006521104-0000000006521442 -rw-rw---- 1 root microk8s 8364488 May 21 11:13 0000000006521443-0000000006521849 -rw-rw---- 1 root microk8s 8383640 May 21 11:16 0000000006521850-0000000006522237 -rw-rw---- 1 root microk8s 8385296 May 21 11:17 0000000006522238-0000000006522591 -rw-rw---- 1 root microk8s 8384648 May 21 11:18 0000000006522592-0000000006522936 -rw-rw---- 1 root microk8s 1866 May 3 18:58 cluster.crt -rw-rw---- 1 root microk8s 3268 May 3 18:58 cluster.key -rw-rw---- 1 root microk8s 201 May 21 11:19 cluster.yaml -rw-rw-r-- 1 root microk8s 2 May 21 10:08 failure-domain -rw-rw---- 1 root microk8s 61 May 3 19:09 info.yaml srw-rw---- 1 root microk8s 0 May 21 10:08 kine.sock -rw-rw---- 1 root microk8s 67 May 21 10:08 localnode.yaml -rw-rw---- 1 root microk8s 32 May 21 10:59 metadata1 -rw-rw---- 1 root microk8s 32 May 21 10:59 metadata2 -rw-rw---- 1 root microk8s 8388608 May 21 11:19 open-38 -rw-rw---- 1 root microk8s 8388608 May 21 11:17 open-39 -rw-rw---- 1 root microk8s 8388608 May 21 11:18 open-40 -rw-rw---- 1 root microk8s 77671000 May 21 11:13 snapshot-256-6522032-581581168 -rw-rw---- 1 root microk8s 128 May 21 11:13 snapshot-256-6522032-581581168.meta -rw-rw---- 1 root microk8s 78202680 May 21 11:18 snapshot-256-6523056-581866958 -rw-rw---- 1 root microk8s 128 May 21 11:18 snapshot-256-6523056-581866958.meta

MathieuBordere commented 3 years ago

inspection-report-20210521_111445.tar.gz

I am seeing something unexpected in the ls -l (I don't know if it is important or not) two of the nodes have all of the files set to group microk8s but one node has some of them set to root. I know when I tried to recover the cluster I did manually set the group for all files to root to match the spec in the FAQ but it looks like files created after that point in time one one node at least are being created with a different group.

Here's output from the unusual node

-rw------- 1 root microk8s 8378168 May 21 10:40 0000000006514780-0000000006515091 -rw------- 1 root microk8s 8360600 May 21 10:41 0000000006515092-0000000006515615 -rw------- 1 root microk8s 8368520 May 21 10:44 0000000006515616-0000000006516021 -rw------- 1 root microk8s 8381912 May 21 10:46 0000000006516022-0000000006516442 -rw------- 1 root microk8s 8378240 May 21 10:48 0000000006516443-0000000006516869 -rw------- 1 root microk8s 8387288 May 21 10:51 0000000006516870-0000000006517308 -rw------- 1 root microk8s 8372768 May 21 10:51 0000000006517309-0000000006517887 -rw------- 1 root microk8s 8364488 May 21 10:54 0000000006517888-0000000006518294 -rw------- 1 root microk8s 8371472 May 21 10:56 0000000006518295-0000000006518684 -rw------- 1 root microk8s 8374280 May 21 11:00 0000000006518685-0000000006519113 -rw------- 1 root microk8s 8370896 May 21 11:04 0000000006519114-0000000006519438 -rw------- 1 root microk8s 8381552 May 21 11:07 0000000006519439-0000000006519854 -rw------- 1 root microk8s 8379248 May 21 11:09 0000000006519855-0000000006520295 -rw------- 1 root root 8386592 May 21 11:10 0000000006520296-0000000006520838 -rw------- 1 root root 8358152 May 21 11:11 0000000006520839-0000000006521214 -rw------- 1 root root 8353256 May 21 11:13 0000000006521215-0000000006521579 -rw------- 1 root root 8383136 May 21 11:13 0000000006521580-0000000006522017 -rw------- 1 root root 667592 May 21 11:15 0000000006522018-0000000006522055 -rw------- 1 root root 8384256 May 21 11:17 0000000006522056-0000000006522395 -rw------- 1 root root 8364560 May 21 11:18 0000000006522396-0000000006522746 -rw------- 1 root root 8366576 May 21 11:19 0000000006522747-0000000006523125 -rw-rw---- 1 root microk8s 1866 May 3 18:58 cluster.crt -rw-rw---- 1 root microk8s 3268 May 3 18:58 cluster.key -rw-rw---- 1 root microk8s 201 May 21 11:19 cluster.yaml -rw-rw-r-- 1 root microk8s 2 May 21 11:15 failure-domain -rw-rw---- 1 root microk8s 61 May 3 19:09 info.yaml srw------- 1 root root 0 May 21 11:15 kine.sock -rw-rw---- 1 root microk8s 67 May 21 11:15 localnode.yaml -rw-rw---- 1 root microk8s 32 May 21 11:15 metadata1 -rw-rw---- 1 root microk8s 32 May 21 10:59 metadata2 -rw------- 1 root root 8388608 May 21 11:19 open-4 -rw------- 1 root root 8388608 May 21 11:18 open-5 -rw------- 1 root root 8388608 May 21 11:19 open-6 -rw------- 1 root root 77753400 May 21 11:13 snapshot-256-6522037-580700424 -rw------- 1 root root 128 May 21 11:13 snapshot-256-6522037-580700424.meta -rw------- 1 root root 78305680 May 21 11:18 snapshot-256-6523061-580985737 -rw------- 1 root root 128 May 21 11:18 snapshot-256-6523061-580985737.meta

and the other

-rw-rw---- 1 root microk8s 8355056 May 21 10:39 0000000006514588-0000000006514977 -rw-rw---- 1 root microk8s 8363192 May 21 10:41 0000000006514978-0000000006515423 -rw-rw---- 1 root microk8s 8364848 May 21 10:43 0000000006515424-0000000006515892 -rw-rw---- 1 root microk8s 8386160 May 21 10:46 0000000006515893-0000000006516258 -rw-rw---- 1 root microk8s 8377880 May 21 10:47 0000000006516259-0000000006516737 -rw-rw---- 1 root microk8s 8376512 May 21 10:50 0000000006516738-0000000006517083 -rw-rw---- 1 root microk8s 8386520 May 21 10:51 0000000006517084-0000000006517682 -rw-rw---- 1 root microk8s 8387168 May 21 10:53 0000000006517683-0000000006518119 -rw-rw---- 1 root microk8s 8384072 May 21 10:55 0000000006518120-0000000006518513 -rw-rw---- 1 root microk8s 8365856 May 21 10:57 0000000006518514-0000000006518996 -rw-rw---- 1 root microk8s 8357432 May 21 11:02 0000000006518997-0000000006519305 -rw-rw---- 1 root microk8s 8384072 May 21 11:06 0000000006519306-0000000006519699 -rw-rw---- 1 root microk8s 8386448 May 21 11:08 0000000006519700-0000000006520126 -rw-rw---- 1 root microk8s 8380616 May 21 11:10 0000000006520127-0000000006520643 -rw-rw---- 1 root microk8s 8380616 May 21 11:11 0000000006520644-0000000006521103 -rw-rw---- 1 root microk8s 8384216 May 21 11:12 0000000006521104-0000000006521442 -rw-rw---- 1 root microk8s 8364488 May 21 11:13 0000000006521443-0000000006521849 -rw-rw---- 1 root microk8s 8383640 May 21 11:16 0000000006521850-0000000006522237 -rw-rw---- 1 root microk8s 8385296 May 21 11:17 0000000006522238-0000000006522591 -rw-rw---- 1 root microk8s 8384648 May 21 11:18 0000000006522592-0000000006522936 -rw-rw---- 1 root microk8s 1866 May 3 18:58 cluster.crt -rw-rw---- 1 root microk8s 3268 May 3 18:58 cluster.key -rw-rw---- 1 root microk8s 201 May 21 11:19 cluster.yaml -rw-rw-r-- 1 root microk8s 2 May 21 10:08 failure-domain -rw-rw---- 1 root microk8s 61 May 3 19:09 info.yaml srw-rw---- 1 root microk8s 0 May 21 10:08 kine.sock -rw-rw---- 1 root microk8s 67 May 21 10:08 localnode.yaml -rw-rw---- 1 root microk8s 32 May 21 10:59 metadata1 -rw-rw---- 1 root microk8s 32 May 21 10:59 metadata2 -rw-rw---- 1 root microk8s 8388608 May 21 11:19 open-38 -rw-rw---- 1 root microk8s 8388608 May 21 11:17 open-39 -rw-rw---- 1 root microk8s 8388608 May 21 11:18 open-40 -rw-rw---- 1 root microk8s 77671000 May 21 11:13 snapshot-256-6522032-581581168 -rw-rw---- 1 root microk8s 128 May 21 11:13 snapshot-256-6522032-581581168.meta -rw-rw---- 1 root microk8s 78202680 May 21 11:18 snapshot-256-6523056-581866958 -rw-rw---- 1 root microk8s 128 May 21 11:18 snapshot-256-6523056-581866958.meta

This output looks OK to me, but that's post-recovery I assume? When it happens again, can you again perform a ls -l /var/snap/microk8s/current/var/kubernetes/backend/ on all nodes while it's in the bad state?

MathieuBordere commented 3 years ago

@CharlesGillanders On a sidenote, do you want to share one of those snapshot files with me (I maintain the dqlite database that microk8s uses)? I want to find out if they compress well. If your system doesn't contain sensitive info that is ... :-)

ktsakalozos commented 3 years ago

I feel the cluster is in a dangerous state. If you look at the logs with journalctl -f -u snap.microk8s.daemon-kubelite you will see messages like May 21 11:11:43 pi-node1 microk8s.daemon-kubelite[3802920]: Trace[736557743]: ---"Object stored in database" 7755ms (11:11:00.704) indicating long delays in the datastore. When an entry is added to the data store all nodes need to communicate over the network and write on their disk. @CharlesGillanders what are the specs of the disk on each node and the network connecting the nodes? Is there any workload running that is particularly demanding in terms of network or disk requirements?

CharlesGillanders commented 3 years ago

This is 3 raspberry PIs running from USB (SanDisk 3.2Gen1 USB Stick). There is no particular load on this (test/home use only). Networking is the 1GB onboard NIC for the PIs.

Am I reading that right that there is a nearly 8s delay in one datastore call? That seems very excessive. Thanks for the pointer, I will go do some performance testing and see if there's one or more of the PIs that may have a failing disk.

CharlesGillanders commented 3 years ago

I checked all three PIs - all seem to have roughly equivalent disk read performance

Node1: hdparm -Tt /dev/sda /dev/sda: Timing cached reads: 1450 MB in 2.00 seconds = 724.44 MB/sec Timing buffered disk reads: 314 MB in 3.01 seconds = 104.29 MB/sec Node2: hdparm -Tt /dev/sda /dev/sda: Timing cached reads: 1424 MB in 2.00 seconds = 712.62 MB/sec Timing buffered disk reads: 260 MB in 3.04 seconds = 85.50 MB/sec Node3: hdparm -Tt /dev/sda /dev/sda: Timing cached reads: 1558 MB in 2.00 seconds = 780.15 MB/sec Timing buffered disk reads: 286 MB in 3.00 seconds = 95.29 MB/sec

I'll need to test write performance but I do see that on all three nodes apiservice-kicker is using most of the CPU fairly constantly, I suspect that's not normal but it might be an artifact of the issue I'm seeing rather than a cause?

top - 12:52:42 up 6 days, 18:57, 2 users, load average: 13.49, 12.57, 11.04 Tasks: 222 total, 5 running, 215 sleeping, 0 stopped, 2 zombie %Cpu(s): 21.2 us, 74.5 sy, 0.0 ni, 2.7 id, 1.1 wa, 0.0 hi, 0.5 si, 0.0 st MiB Mem : 7808.8 total, 248.1 free, 2532.0 used, 5028.7 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 4708.7 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

3925859 root 20 0 3090028 884588 93588 S 93.8 11.1 16:00.99 kubelite 3803349 root 20 0 9760 704 0 R 92.5 0.0 134:12.32 apiservice-kick 1117080 root 20 0 9760 708 0 R 91.8 0.0 8653:58 apiservice-kick 3362076 root 20 0 9760 732 0 R 90.5 0.0 524:15.17 apiservice-kick

top - 12:53:24 up 6 days, 19:12, 2 users, load average: 17.16, 16.09, 14.51 Tasks: 251 total, 5 running, 245 sleeping, 0 stopped, 1 zombie %Cpu(s): 4.4 us, 79.5 sy, 0.0 ni, 0.5 id, 15.4 wa, 0.0 hi, 0.2 si, 0.0 st MiB Mem : 7808.8 total, 225.3 free, 2869.3 used, 4714.3 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 4398.2 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

1149649 root 20 0 9892 752 0 R 99.0 0.0 8856:55 apiservice-kick 4149852 root 20 0 9760 708 0 R 98.7 0.0 719:40.53 apiservice-kick 3451350 root 20 0 9892 744 0 R 97.7 0.0 1392:43 apiservice-kick 249774 root 20 0 3045796 804444 93160 S 20.7 10.1 2:37.57 kubelite

top - 12:53:49 up 6 days, 19:26, 2 users, load average: 15.01, 13.48, 11.75 Tasks: 212 total, 6 running, 206 sleeping, 0 stopped, 0 zombie %Cpu(s): 1.0 us, 99.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 7808.8 total, 356.5 free, 2367.5 used, 5084.9 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 4873.6 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

2231802 root 20 0 9760 708 0 R 81.6 0.0 7754:43 apiservice-kick 691776 root 20 0 9760 708 0 R 80.6 0.0 252:51.77 apiservice-kick 2858799 root 20 0 9760 704 0 R 79.6 0.0 2279:30 apiservice-kick 1972701 root 20 0 9760 708 0 R 75.7 0.0 3319:28 apiservice-kick 347496 root 20 0 9892 752 0 R 73.7 0.0 567:24.52 apiservice-kick

CharlesGillanders commented 3 years ago

Yeh I did that if you look in the previous comment.

Node1: hdparm -Tt /dev/sda /dev/sda: Timing cached reads: 1450 MB in 2.00 seconds = 724.44 MB/sec Timing buffered disk reads: 314 MB in 3.01 seconds = 104.29 MB/sec Node2: hdparm -Tt /dev/sda /dev/sda: Timing cached reads: 1424 MB in 2.00 seconds = 712.62 MB/sec Timing buffered disk reads: 260 MB in 3.04 seconds = 85.50 MB/sec Node3: hdparm -Tt /dev/sda /dev/sda: Timing cached reads: 1558 MB in 2.00 seconds = 780.15 MB/sec Timing buffered disk reads: 286 MB in 3.00 seconds = 95.29 MB/sec

CharlesGillanders commented 3 years ago

Write performance appears to be awful on all three nodes.

root@pi-node1:~# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 635.829 s, 1.7 MB/s

root@pi-node2:/# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 763.903 s, 1.4 MB/s

root@pi-node3:/# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 435.923 s, 2.5 MB/s

CharlesGillanders commented 3 years ago

@MathieuBordere I'm happy to send you one of the snapshots - can I do so without uploading it to github?

MathieuBordere commented 3 years ago

@MathieuBordere I'm happy to send you one of the snapshots - can I do so without uploading it to github?

Thanks! You can send me a link at mathieu.bordere@canonical.com

MathieuBordere commented 3 years ago

@CharlesGillanders It appears your issues are caused by the very limited write performance of your storage. Does your Pi have an internal flash? Might be worth it to experiment with that and see if things improve, I would be interested to know if that solves it. We are planning to address similar issues like yours by compressing the snapshots and lowering the frequency with which they are taken, should take some pressure of the disk.

CharlesGillanders commented 3 years ago

Digging around online I found this post https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=245931 about slow USB disk performance on Raspberry PI. I followed those instructions to add a usb-storage.quirks entry to cmdline.txt and that has substantially improved write performance on these Raspberry PI nodes.

pi-node1:~# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.74865 s, 139 MB/s

pi-node2:~$ sudo dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.50007 s, 143 MB/s

pi-node3:~$ sudo dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 1+0 records in 1+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.64548 s, 140 MB/s

I am going to restart the microk8s cluster and see if it is any more stable now.

MathieuBordere commented 3 years ago

Nice!

bkappler69 commented 3 years ago

I'll need to test write performance but I do see that on all three nodes apiservice-kicker is using most of the CPU fairly constantly, I suspect that's not normal but it might be an artifact of the issue I'm seeing rather than a cause?

top - 12:52:42 up 6 days, 18:57, 2 users, load average: 13.49, 12.57, 11.04 Tasks: 222 total, 5 running, 215 sleeping, 0 stopped, 2 zombie %Cpu(s): 21.2 us, 74.5 sy, 0.0 ni, 2.7 id, 1.1 wa, 0.0 hi, 0.5 si, 0.0 st MiB Mem : 7808.8 total, 248.1 free, 2532.0 used, 5028.7 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 4708.7 avail Mem
PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3925859 root 20 0 3090028 884588 93588 S 93.8 11.1 16:00.99 kubelite 3803349 root 20 0 9760 704 0 R 92.5 0.0 134:12.32 apiservice-kick 1117080 root 20 0 9760 708 0 R 91.8 0.0 8653:58 apiservice-kick 3362076 root 20 0 9760 732 0 R 90.5 0.0 524:15.17 apiservice-kick

I see the same issue with apiservice-kicker running at 100% CPU load. Looking at apiservice-kicker (bash script) it should not cause load. The script should sleep most of the time and every 3 seconds look for some changes and if needed restart services... I'm new to microk8s and not sure how-to best debug apiservice-kicker.

After starting my system (currently a single node Raspberry PI 4 8GB - will add two more shortly) the system runs fine for a while, then the apiservice-kicker starts consuming too much CPU.

top

top - 05:44:37 up 1 day, 20:40, 2 users, load average: 2.34, 2.79, 2.92 Tasks: 207 total, 2 running, 205 sleeping, 0 stopped, 0 zombie %Cpu(s): 7.7 us, 32.5 sy, 0.0 ni, 59.6 id, 0.1 wa, 0.0 hi, 0.2 si, 0.0 st MiB Mem : 7808.9 total, 2506.1 free, 1309.9 used, 3992.8 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 6072.9 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND

2568444 root 20 0 9128 668 0 R 99.7 0.0 339:47.91 apiservice-kick
136424 root 20 0 2696572 666308 71952 S 16.5 8.3 600:26.70 kube-apiserver
136431 root 20 0 9128 3584 2956 S 0.0 0.0 5:47.40 apiservice-kick
2568440 root 20 0 9128 2120 1472 S 0.0 0.0 0:00.00 apiservice-kick
2568441 root 20 0 9128 2012 1348 S 0.0 0.0 0:00.00 apiservice-kick
#

ps -eo pid,lstart,cmd | grep api

136424 Fri May 21 11:07:20 2021 /snap/microk8s/2217/kube-apiserver .... 136431 Fri May 21 11:07:20 2021 /bin/bash /snap/microk8s/2217/apiservice-kicker 2568440 Sun May 23 00:04:16 2021 /bin/bash /snap/microk8s/2217/apiservice-kicker 2568441 Sun May 23 00:04:16 2021 /bin/bash /snap/microk8s/2217/apiservice-kicker 2568444 Sun May 23 00:04:16 2021 /bin/bash /snap/microk8s/2217/apiservice-kicker

Any thought or pointers on how-to debug?

Best Regards Bernhard

CharlesGillanders commented 3 years ago

Any thought or pointers on how-to debug?

Best Regards Bernhard

I found that the logging seems to be part of the problem (I saw some evidence of similar issues and solutions with Rancher K3s) - I adjusted journald to use volatile logging so that it isn't spending as much time writing log files to disk.

https://github.com/k3s-io/k3s/issues/1297

https://unix.stackexchange.com/questions/482920/store-more-logs-in-memory-with-systemd-journald

Since making this change I am no longer seeing the same constant load from apiserver-kicker.

I spoke too soon, the disk load has vastly reduced but I am still seeing occasional very high CPU from what seems to be a crashed instance of apiserver-kicker. I'm looking deeper to see if I can identify what apiserver-kicker was doing when it crashed.

MathieuBordere commented 3 years ago

@CharlesGillanders Has your cluster been more stable since performing the usb changes?

CharlesGillanders commented 3 years ago

@CharlesGillanders Has your cluster been more stable since performing the usb changes?

Yes the cluster is more stable (nodes are no longer disconnecting or not being part of the cluster) but I am still seeing the apiservice-kicker script eventually hang and start to consume large amounts of CPU. When this happens the script cannot be killed and the only resolution seems to be a reboot of the node.

This morning I have taken the current snap I was running and on one node I have unsquashed the snap FS, edited the kicker script to add set -eux and rerun the snap with snap try and I'm capturing the logged output from the script -hopefully that will allow me to identify exactly what the script was doing when it hangs.

CharlesGillanders commented 3 years ago

So I captured one of the hang conditions. The following is the last set of entries logged by the apiserver-kicker script.

2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + sleep 3 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + '[' -e /var/snap/microk8s/x1/var/lock/ha-cluster ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + getent group microk8s 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + chmod -R ug+rwX /var/snap/microk8s/x1/var/kubernetes/backend 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + chgrp microk8s -R /var/snap/microk8s/x1/var/kubernetes/backend 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + '[' -e /var/snap/microk8s/x1/var/lock/no-cert-reissue ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + grep -E '(--advertise-address|--bind-address)' /var/snap/microk8s/x1/args/kube-apiserver 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181730]: + ip route 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181731]: + grep default 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181732]: + snapctl services microk8s.daemon-kubelite 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181733]: + grep active 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ produce_certs 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ export OPENSSL_CONF=/snap/microk8s/current/etc/ssl/openssl.cnf 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ OPENSSL_CONF=/snap/microk8s/current/etc/ssl/openssl.cnf 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ for key in serviceaccount.key ca.key server.key front-proxy-ca.key front-proxy-client.key 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ '[' -f /var/snap/microk8s/x1/certs/serviceaccount.key ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ for key in serviceaccount.key ca.key server.key front-proxy-ca.key front-proxy-client.key 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ '[' -f /var/snap/microk8s/x1/certs/ca.key ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ for key in serviceaccount.key ca.key server.key front-proxy-ca.key front-proxy-client.key 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ '[' -f /var/snap/microk8s/x1/certs/server.key ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ for key in serviceaccount.key ca.key server.key front-proxy-ca.key front-proxy-client.key 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ '[' -f /var/snap/microk8s/x1/certs/front-proxy-ca.key ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ for key in serviceaccount.key ca.key server.key front-proxy-ca.key front-proxy-client.key 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ '[' -f /var/snap/microk8s/x1/certs/front-proxy-client.key ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ '[' -f /var/snap/microk8s/x1/certs/ca.crt ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ '[' -f /var/snap/microk8s/x1/certs/front-proxy-ca.crt ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ render_csr_conf 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ get_ips 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181742]: ++++ /snap/microk8s/x1/bin/hostname -I 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ local 'IP_ADDR=192.168.57.32 10.1.227.64 ' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ local CNI_INTERFACE=vxlan.calico 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ [[ -z 192.168.57.32 10.1.227.64 ]] 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ /snap/microk8s/x1/sbin/ifconfig vxlan.calico 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181745]: ++++ /snap/microk8s/x1/sbin/ip -o -4 addr list vxlan.calico 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181746]: ++++ /snap/microk8s/x1/usr/bin/gawk '{print $4}' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181747]: ++++ /snap/microk8s/x1/usr/bin/cut -d/ -f1 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181748]: ++++ head -1 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ CNI_IP=10.1.227.64 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ local ips= 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ for ip in $IP_ADDR 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ '[' 192.168.57.32 '!=' 10.1.227.64 ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ ips+=192.168.57.32 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ for ip in $IP_ADDR 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ '[' 10.1.227.64 '!=' 10.1.227.64 ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ IP_ADDR=192.168.57.32 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181741]: +++ echo 192.168.57.32 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ local IP_ADDRESSES=192.168.57.32 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ cp /var/snap/microk8s/x1/certs/csr.conf.template /var/snap/microk8s/x1/certs/csr.conf.rendered 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ '[' 192.168.57.32 == 127.0.0.1 ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ '[' 192.168.57.32 == none ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ local ips= sep= 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ local -i i=3 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181750]: +++ echo 192.168.57.32 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ for IP_ADDR in $(echo "$IP_ADDRESSES") 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ ips+='IP.3 = 192.168.57.32' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ sep='\n' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ /snap/microk8s/x1/bin/sed -i 's/#MOREIPS/IP.3 = 192.168.57.32/g' /var/snap/microk8s/x1/certs/csr.conf.rendered 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ '[' -f /var/snap/microk8s/x1/certs/csr.conf ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ local force 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ /snap/microk8s/x1/usr/bin/cmp -s /var/snap/microk8s/x1/certs/csr.conf.rendered /var/snap/microk8s/x1/certs/csr.conf 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ force=false 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ false 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ '[' '!' -f /var/snap/microk8s/x1/certs/front-proxy-client.crt ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181753]: +++ /snap/microk8s/x1/usr/bin/openssl x509 -noout -issuer 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ '[' 'issuer=CN = front-proxy-ca' == 'issuer=CN = 127.0.0.1' ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181740]: ++ echo 0 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + csr_modified=0 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + [[ 0 -eq 1 ]] 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + restart_attempt=0 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + '[' -e /var/snap/microk8s/x1/args/cni-network/cni.yaml ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + '[' -e /var/snap/microk8s/x1/var/lock/ha-cluster ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + '[' -e /var/snap/microk8s/x1/var/lock/cni-loaded ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + '[' -e /var/snap/microk8s/x1/var/lock/cni-loaded ']' 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + is_nic_available 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + ip a show dev vxlan.calico 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + check_offloading 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181756]: + ethtool --show-offload vxlan.calico 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181758]: + grep -q ': on'

Here is the previous run of the same script a few seconds earlier

2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[1888]: + '[' -e /var/snap/microk8s/x1/var/lock/ha-cluster ']' 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[1888]: + getent group microk8s 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[1888]: + chmod -R ug+rwX /var/snap/microk8s/x1/var/kubernetes/backend 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[1888]: + chgrp microk8s -R /var/snap/microk8s/x1/var/kubernetes/backend 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[1888]: + '[' -e /var/snap/microk8s/x1/var/lock/no-cert-reissue ']' 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[1888]: + grep -E '(--advertise-address|--bind-address)' /var/snap/microk8s/x1/args/kube-apiserver 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181672]: + ip route 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181673]: + grep default 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181674]: + snapctl services microk8s.daemon-kubelite 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181675]: + grep active 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ produce_certs 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ export OPENSSL_CONF=/snap/microk8s/current/etc/ssl/openssl.cnf 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ OPENSSL_CONF=/snap/microk8s/current/etc/ssl/openssl.cnf 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ for key in serviceaccount.key ca.key server.key front-proxy-ca.key front-proxy-client.key 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ '[' -f /var/snap/microk8s/x1/certs/serviceaccount.key ']' 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ for key in serviceaccount.key ca.key server.key front-proxy-ca.key front-proxy-client.key 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ '[' -f /var/snap/microk8s/x1/certs/ca.key ']' 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ for key in serviceaccount.key ca.key server.key front-proxy-ca.key front-proxy-client.key 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ '[' -f /var/snap/microk8s/x1/certs/server.key ']' 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ for key in serviceaccount.key ca.key server.key front-proxy-ca.key front-proxy-client.key 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ '[' -f /var/snap/microk8s/x1/certs/front-proxy-ca.key ']' 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ for key in serviceaccount.key ca.key server.key front-proxy-ca.key front-proxy-client.key 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ '[' -f /var/snap/microk8s/x1/certs/front-proxy-client.key ']' 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ '[' -f /var/snap/microk8s/x1/certs/ca.crt ']' 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ '[' -f /var/snap/microk8s/x1/certs/front-proxy-ca.crt ']' 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ render_csr_conf 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ get_ips 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181683]: ++++ /snap/microk8s/x1/bin/hostname -I 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ local 'IP_ADDR=192.168.57.32 10.1.227.64 ' 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ local CNI_INTERFACE=vxlan.calico 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ [[ -z 192.168.57.32 10.1.227.64 ]] 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ /snap/microk8s/x1/sbin/ifconfig vxlan.calico 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181686]: ++++ /snap/microk8s/x1/sbin/ip -o -4 addr list vxlan.calico 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181687]: ++++ /snap/microk8s/x1/usr/bin/gawk '{print $4}' 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181688]: ++++ /snap/microk8s/x1/usr/bin/cut -d/ -f1 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181689]: ++++ head -1 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ CNI_IP=10.1.227.64 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ local ips= 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ for ip in $IP_ADDR 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ '[' 192.168.57.32 '!=' 10.1.227.64 ']' 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ ips+=192.168.57.32 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ for ip in $IP_ADDR 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ '[' 10.1.227.64 '!=' 10.1.227.64 ']' 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ IP_ADDR=192.168.57.32 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181682]: +++ echo 192.168.57.32 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ local IP_ADDRESSES=192.168.57.32 2021-05-26T15:56:02Z microk8s.daemon-apiserver-kicker[181681]: ++ cp /var/snap/microk8s/x1/certs/csr.conf.template /var/snap/microk8s/x1/certs/csr.conf.rendered 2021-05-26T15:56:03Z microk8s.daemon-apiserver-kicker[181681]: ++ '[' 192.168.57.32 == 127.0.0.1 ']' 2021-05-26T15:56:03Z microk8s.daemon-apiserver-kicker[181681]: ++ '[' 192.168.57.32 == none ']' 2021-05-26T15:56:03Z microk8s.daemon-apiserver-kicker[181681]: ++ local ips= sep= 2021-05-26T15:56:03Z microk8s.daemon-apiserver-kicker[181681]: ++ local -i i=3 2021-05-26T15:56:03Z microk8s.daemon-apiserver-kicker[181707]: +++ echo 192.168.57.32 2021-05-26T15:56:03Z microk8s.daemon-apiserver-kicker[181681]: ++ for IP_ADDR in $(echo "$IP_ADDRESSES") 2021-05-26T15:56:03Z microk8s.daemon-apiserver-kicker[181681]: ++ ips+='IP.3 = 192.168.57.32' 2021-05-26T15:56:03Z microk8s.daemon-apiserver-kicker[181681]: ++ sep='\n' 2021-05-26T15:56:03Z microk8s.daemon-apiserver-kicker[181681]: ++ /snap/microk8s/x1/bin/sed -i 's/#MOREIPS/IP.3 = 192.168.57.32/g' /var/snap/microk8s/x1/certs/csr.conf.rendered 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181681]: ++ '[' -f /var/snap/microk8s/x1/certs/csr.conf ']' 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181681]: ++ local force 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181681]: ++ /snap/microk8s/x1/usr/bin/cmp -s /var/snap/microk8s/x1/certs/csr.conf.rendered /var/snap/microk8s/x1/certs/csr.conf 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181681]: ++ force=false 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181681]: ++ false 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181681]: ++ '[' '!' -f /var/snap/microk8s/x1/certs/front-proxy-client.crt ']' 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181710]: +++ /snap/microk8s/x1/usr/bin/openssl x509 -noout -issuer 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181681]: ++ '[' 'issuer=CN = front-proxy-ca' == 'issuer=CN = 127.0.0.1' ']' 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181681]: ++ echo 0 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + csr_modified=0 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + [[ 0 -eq 1 ]] 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + restart_attempt=0 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + '[' -e /var/snap/microk8s/x1/args/cni-network/cni.yaml ']' 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + '[' -e /var/snap/microk8s/x1/var/lock/ha-cluster ']' 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + '[' -e /var/snap/microk8s/x1/var/lock/cni-loaded ']' 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + '[' -e /var/snap/microk8s/x1/var/lock/cni-loaded ']' 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + is_nic_available 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + ip a show dev vxlan.calico 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + check_offloading 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181713]: + ethtool --show-offload vxlan.calico 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181714]: + grep -E '^.x-checksumming:' 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181715]: + grep -q ': on' 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + return 0 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181716]: + snapctl services microk8s.daemon-apiserver 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181717]: + grep active 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + '[' 1 -eq 0 ']' 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + true 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + '[' 0 -ge 5 ']'

I think the obvious difference between the two is in the check_offloading section.

2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[1888]: + check_offloading 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181756]: + ethtool --show-offload vxlan.calico 2021-05-26T15:56:08Z microk8s.daemon-apiserver-kicker[181758]: + grep -q ': on'

2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + check_offloading 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181713]: + ethtool --show-offload vxlan.calico 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181714]: + grep -E '^.x-checksumming:' 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[181715]: + grep -q ': on' 2021-05-26T15:56:05Z microk8s.daemon-apiserver-kicker[1888]: + return 0

I am confused - how is the grep for the x-checksumming not happening in the hang situation? The script isn't changing between runs - oh and does this have anything to do with the check not returning a value at all?

(Further Edit)

I checked with PS and I can see the grep for 'on' still running but that's the only sub-statement here that's not stuck

pi-node3:~$ ps -ef | grep 18175 root 181757 1888 99 15:56 ? 01:54:22 /bin/bash /snap/microk8s/x1/apiservice-kicker root 181758 1888 0 15:56 ? 00:00:00 grep -q : on

I was able to kill -9 the 181758 process but the 181757 won't die....

bkappler69 commented 3 years ago

I tried matching your output to my apiservice-kicker script and found you are on a different version than I am.

This entire section (which I assume is in your version) is not included in mine:

    if [ -e "${SNAP_DATA}/var/lock/cni-loaded" ] &&
      is_nic_available &&
      ! check_offloading
    then
      echo "Network interface $nic_name found! Disabling offloading on it..."
      deactivate_offloading
      sleep 2
      if check_offloading; then
        echo "Offloading successfully disabled on interface $nic_name"
      else
        echo "Offloading has not been disabled correctly on interface $nic_name. Please check what happened"
      fi
    fi

Still, we seem to be facing the same issue. looking at the difference between the "good" and the "bad" iteration of the logs you shared, I realize that the last strace line of each sub-process seems missing in the "bad" execution... Therefore I assume that the buffers did not fully flush and the strace output is incomplete. This would also explain the "changed grep"... it's the last output of one of the forks...

Therefore I assume the script hangs in the last part of the loop:

   if [ $installed_registry_help -eq 0 ] &&
       snapctl services microk8s.daemon-apiserver | grep active &> /dev/null &&
       "$SNAP/kubectl" "--kubeconfig=$SNAP_DATA/credentials/client.config" get namespace kube-public &> /dev/null
    then
      if ! "$SNAP/kubectl" "--kubeconfig=$SNAP_DATA/credentials/client.config" get configmap local-registry-hosting -n kube-public &> /dev/null
      then
        use_manifest registry-help apply
      fi

      installed_registry_help=1
    fi

I will wait for apiservice-kicker to hang again and then try the individual commands from this section...

Killing the "hanging" process does not work. I have to reboot and the "shutdown" takes very long!

Update: apiservice-kicker is hanging again, but all of the above commands still execute as expected... Seems a dead-end for debugging?!

bkappler69 commented 3 years ago

some news...

I found some older threads talking about issues with apiserver restarting when metrics server is enabled. Therefore, I've setup a second Raspberry running microk8s, but did not add it to my cluster yet. I observed that this system did not show the hanging apiservice-kicker and the only difference was that I had not enabled any of the core dns, metrics server, ... services yet.

I decided to enable metrics server and disable metrics server on the other raspberry, and as of now (~ 12 hours of runtime) the new raspberry running media server had apiservice-kicker hang twice, on the first node (metrics server stopped) did not show the issue...

There seems a relation between the hang and metrics server being active!

CharlesGillanders commented 3 years ago

Thanks for the tip @bkappler69 - I'm going to try that now and see what results I get over the weekend.

CharlesGillanders commented 3 years ago

For the benefit of @MathieuBordere and @ktsakalozos the advice from @bkappler69 has so far been perfect!

I have had no problems with the api-service-kicker in the last 72 hours, previously I could guarantee getting this issue on at least one node in any 24 hour period.

So it looks like the best practices for getting stable and reliable Microk8s on raspberry PI 4 has been a combination of

1) Configuring journald for volatile logging 2) Improving underlying disk storage performance (I ended up totally replacing my existing USB flash drives with USB attached 2.5" SSD with a USB to SATA convertor which properly supported UASP) 3) Disabling metrics server

bkappler69 commented 3 years ago

new info on the hang!

I upgrade my cluster to 3 nodes and moved to snap install microk8s --classic --channel=1.21/stable Still apiserver-kicker hangs every couple of days...

Analyzing the process states it turns out that the hang relates to the "check_offloading" - same as in @CharlesGillanders debugs above.

I created a simple script just containing the "check_offloading" part running in a loop with no sleep... With this script I was able to reproduce the issue - this is completely independent of API Server:

#!/bin/bash

set -eu

nic_name="vxlan.calico"

is_nic_available() {
  ip a show dev $nic_name > /dev/null 2>&1
}

check_offloading() {
  # Return an error if at least one offload is enabled (rx or tx)
  if ethtool --show-offload $nic_name 2>&1 | grep -E '^.x-checksumming:' | grep -q  ': on'; then
    return 1
  else
    return 0
  fi
}

while true; do
    echo -n "."
    if is_nic_available && ! check_offloading
    then
      echo "Network interface $nic_name found! Disabling offloading on it..."
     # deactivate_offloading
      sleep 2
      if check_offloading; then
        echo "Offloading successfully disabled on interface $nic_name"
      else
        echo "Offloading has not been disabled correctly on interface $nic_name. Please check what happened"
      fi
    fi

done

The issue seems to sit here:

if ethtool --show-offload $nic_name 2>&1 | grep -E '^.x-checksumming:' | grep -q ': on'; then

In case of a hang, the ethtool comand executes and outputs into a pipe, but the grep statement is either not started or does not receive the data from the pipe.

A tail on the pipe from ethtool (tail /proc/PID/fd/4) returns the output once (meaning it was not read yet)...

My system was under load (~5-6) at the time of the failure....

Best Regards Bernhard

Update: I replaced the "ethtool" command with a plain "echo ...." command. Script still hangs after a while...

ktsakalozos commented 3 years ago

@bkappler69 thank you for trying to dig into this bug. I am trying to reproduce it using your script with no luck so far.

You say the script hangs after a "while". How long do you have to wait? Maybe I am not giving it enough time.

Do you have MicroK8s with the metrics-server running in the background? I wonder if this has any effect as others reported in this issue.

bkappler69 commented 3 years ago

@ktsakalozos it's hard to predict on how long it will take for the script to hang, but for me it's in the range of 1 to 6 hours relatively reliable. I've reduced the script to an absolut minimum, which still shows the issue:

#!/bin/bash

while true; do

    echo -n "."

    if echo "hw-tc-offload: off [fixed]" 2>&1 | grep  'x-checksumming' ; then

         echo "should never get here...."

    fi

done

Overall the script does not really do anything ecept an "echo" which used to be the ethtool from the original and grep the output. The problem must be either in bash, grep or the kernel directly.

The hanging process cannot be killed, it is in state running, buring 100% CPU in system space, no time in user space....

My environment is:

three node cluster of Raspberry PIs 4 with 8GB
booting of SSD directly
Ubuntu 64bit server (deployed via rpi-imager)
microk8s 1.21/stable installed addons: enabled: dns # CoreDNS ha-cluster # Configure high availability on the current node helm3 # Helm 3 - Kubernetes package manager ingress # Ingress controller for external access metallb # Loadbalancer for your Kubernetes cluster storage # Storage class; allocates storage from host directory
glusterfs 9.2

My cluster is not too busy yet (still in the setup phase).

I had my "test" script running on all 3 nodes over night, in the morning 2 got stuck. The hang seems to occur faster if there's load on the system (reboot one node, such that pods have to failover).

RaghuMeda commented 3 years ago

I installed microk8s on my Raspberry Pi3. microk8s inpsect will be stuck when it tries to inspect cluster. Its single node cluster. microk8s status will just hangs up for more than 20 mins!! I modified cmdline.txt as suggested above and rebooted and there is no improvement. I modified journald config as well to volatile. I have not connected any USB device for disk. It's a plain rpi3 with SD card Ubuntu 20.04 LTS server image. KUbectl, APi server and scheduler are taking up all CPU.

I cant give tarball as microk8s inspect just hangs on as shown below:

ubuntu@raspberrypi3:~$ microk8s status microk8s is not running. Use microk8s inspect for a deeper inspection.

ubuntu@raspberrypi3:~$ microk8s inspect Inspecting Certificates Inspecting services Service snap.microk8s.daemon-cluster-agent is running Service snap.microk8s.daemon-flanneld is running Service snap.microk8s.daemon-containerd is running Service snap.microk8s.daemon-apiserver is running Service snap.microk8s.daemon-apiserver-kicker is running Service snap.microk8s.daemon-proxy is running Service snap.microk8s.daemon-kubelet is running Service snap.microk8s.daemon-scheduler is running Service snap.microk8s.daemon-controller-manager is running Service snap.microk8s.daemon-etcd is running Copy service arguments to the final report tarball Inspecting AppArmor configuration Gathering system information Copy processes list to the final report tarball Copy snap list to the final report tarball Copy VM name (or none) to the final report tarball Copy disk usage information to the final report tarball Copy memory usage information to the final report tarball Copy server uptime to the final report tarball Copy current linux distribution to the final report tarball Copy openSSL information to the final report tarball Copy network configuration to the final report tarball Inspecting kubernetes cluster Inspect kubernetes cluster

--its just hangs up here and inspect wont complete at all even after waiting for 30 mins.

can anyone please suggest how to progress ahead with microk8s without adding any additional hardware.

adamgoryachev commented 3 years ago

I've been suffering from a lot of instability in trying to build up a cluser of RPi4's running microk8s and it would appear that this is the underlying cause. I'm seeing slow disk performance, high load (not from CPU but from disk wait time), and the indicative logs like this: Sep 30 13:57:47 pi-06 microk8s.daemon-kubelite[1845]: Trace[1509214241]: ---"Object stored in database" 1627ms (13:57:47.649) Sep 30 13:57:47 pi-06 microk8s.daemon-kubelite[1845]: Trace[922152618]: ---"Object stored in database" 272ms (13:57:47.671) Sep 30 13:57:47 pi-06 microk8s.daemon-kubelite[1845]: Trace[1908037004]: ---"Object stored in database" 125ms (13:57:47.692) Sep 30 13:58:52 pi-06 microk8s.daemon-kubelite[1845]: Trace[1136357025]: ---"Object stored in database" 1032ms (13:58:52.600) Sep 30 13:58:53 pi-06 microk8s.daemon-kubelite[1845]: Trace[1526076370]: ---"Object stored in database" 1108ms (13:58:53.042) Sep 30 13:58:53 pi-06 microk8s.daemon-kubelite[1845]: Trace[30915180]: ---"Object stored in database" 1242ms (13:58:53.252) Sep 30 13:58:54 pi-06 microk8s.daemon-kubelite[1845]: Trace[263880530]: ---"Object stored in database" 520ms (13:58:54.638) Sep 30 13:58:54 pi-06 microk8s.daemon-kubelite[1845]: Trace[2122212928]: ---"Object stored in database" 665ms (13:58:54.788) Sep 30 13:58:55 pi-06 microk8s.daemon-kubelite[1845]: Trace[1513088012]: ---"Object stored in database" 1142ms (13:58:55.266) Sep 30 13:59:32 pi-06 microk8s.daemon-kubelite[1845]: Trace[1857129140]: ---"Object stored in database" 665ms (13:59:32.053) Sep 30 14:00:01 pi-06 microk8s.daemon-kubelite[1845]: Trace[1189483486]: ---"Object stored in database" 517ms (14:00:01.544) Sep 30 14:00:01 pi-06 microk8s.daemon-kubelite[1845]: Trace[1679077799]: ---"Object stored in database" 546ms (14:00:01.848)

So firstly, can anyone define what time is considered too slow, and what time is "normal", or does this line not get logged at all if it was less than the defined threshold? Secondly, I can see most of the disk usage is taken up by kubelite, is there some way to reduce the amount of disk writes it will use, or could this be redirected to a RAM disk or similar? In the event of a scheduled reboot, this could be written to disk, or in the event of a non-scheduled reboot, could I just add the node like it was a new node or similar?

Or, is it simply not possible to run microk8s on a system without better performing disk?

PS, currently I'm using the fastest SD cards I could find, SanDisk Extreme Pro 64GB UHS-I marked as U3, A2, XC and V30 which is reported as having performance of: Up to 100MB/s read and up to 90MB/s write

The only option which I've now found with a higher performance level seems to be the SanDisk Extreme Pro 64GB UHS-II which has: Up to 275MB/s read and up to 100MB/s write

Is there any other microSD card with better performance, or are there some other options for improving performance on RPi4 etc? Appreciate any suggestions/ideas.

ferrarimarco commented 3 years ago

@adamgoryachev I had a very similar experience with an SD performing as yours (Up to 100MB/s read and up to 90MB/s write).

I wasn't even able to start the cluster sometimes.

The next thing I want to try to rule out other factors, is with an external (USB) disk connected to the Raspberry Pi 4.

Also, how much RAM does yours have?

bkappler69 commented 3 years ago

@adamgoryachev , @ferrarimarco

I had issues with my microk8s cluster - see above (3 Raspberry 4 systems with 8 GB and 1TB SSD attaches to each). In https://github.com/ubuntu/microk8s/issues/2280#issuecomment-866912974 I gave an example of a simple script which when executed will hang after some time (sometimes after minutes, sometimes after hours, but it always got stuck).

I solved my issue by moving all my systems to Debian 64bit and running microk8s there - no issues since then. I suspect either the FW or the kernel version of Ubuntu being the root cause.

Hope this helps.

CharlesGillanders commented 3 years ago

@adamgoryachev @ferrarimarco I now have a working and mostly reliable cluster which I got to by combination of a lot of things but I think most important were.

Booting each pi from a desktop 2.5" SSD which was attached through a USB to SATA convertor which properly supported UASP
Using Debian 64 as the OS

The need for UASP support is really important to call out - that made a huge difference in write IO latency.

JoshuaDodds commented 3 years ago

I find this such an odd issue... in 30 years of using *nix OS's i have never seen a process peg a cpu core to 100% and be absolutely unkillable even by sending SIGCHLD to the parent pid. In this situation, the rogue apiservice-kicker pid which is 'hung' will have a parent pid which is not pid 1 so the oddness continues...

I also am very much having this issue every 2-3 days on at least one node (Rasp Pi 4's with 8GB RAM and high grade SD cards). I turned off most logging and set journald to use volatile storage to ease up disk pressure from all the aggressive logging and this certainly helped the issue but has not solved it entirely.

I did notice that its not really best practice and can result in unexpected behavior when you don't dbl-quote variables to avoid globbing and string splitting and this is the case in the line which appears to be executed right before the hang:

ethtool --show-offload $nic_name 2>&1 | grep -E '^.x-checksumming:' | grep -q ': on'

If you look closely there is also extra whitespace before that last single quoted ': on' search string. But, these two "issues" are moot because strace clearly shows bash removing the extra whitespace and that in both hang and non hang conditions the variable $nic_name is defined as it should be.

So color me stumped...

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

canonical / microk8s

API Server hanging on raspberry pi #2280

top

ps -eo pid,lstart,cmd | grep api