Microk8s: after reboot I have "FAIL: Service snap.microk8s.daemon-kubelet is not running"

MirtoBusico commented 5 years ago

Hi all,

after a reboot I have "FAIL: Service snap.microk8s.daemon-kubelet is not running" error.

How can I start snap.microk8s.daemon-kubelet ?

Is it safe, or this indicate some kind of problem?

Inspect says:

sysop@hoseplavm:~$ microk8s.inspect
[sudo] password di sysop: 
Inspecting services
Service snap.microk8s.daemon-containerd is running
Service snap.microk8s.daemon-apiserver is running
Service snap.microk8s.daemon-proxy is running
FAIL:  Service snap.microk8s.daemon-kubelet is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-kubelet
Service snap.microk8s.daemon-scheduler is running
Service snap.microk8s.daemon-controller-manager is running
Service snap.microk8s.daemon-etcd is running
Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system info
Copy network configuration to the final report tarball
Copy processes list to the final report tarball
Copy snap list to the final report tarball                                                                                                                         
Inspect kubernetes cluster                                                                                                                                         

Building the report tarball
Report tarball is at /var/snap/microk8s/608/inspection-report-20190606_123226.tar.gz
sysop@hoseplavm:~$

Snap says:

sysop@hoseplavm:~$ sudo snap list
Name      Version  Rev    Tracking  Publisher   Notes
core      16-2.39  6964   stable    canonical✓  core
lxd       3.13     10756  stable    canonical✓  -
microk8s  v1.14.2  608    stable    canonical✓  classic
sysop@hoseplavm:~$

Os ia Kubuntu 18.04

journalctl.txt inspection-report-20190606_123226.tar.gz

ktsakalozos commented 5 years ago

@MirtoBusico what are the specs of this machine? Is it a VM or container?

What I see is the kubelet trying to start but the apiserver is not yet operational so it gives up. This could be because the API server is taking too long to start. During a system reboot the API server may be fighting over CPU time with other processes so it takes to long.

Something else that looks strange is that the collected logs seem chopped. Have a look at the timestamps in journalctl.txt you attach and compare them to the timestamps in the journal.log in the inspection tarball under snap.microk8s.daemon-kubelet. The latter has logs up until 08:55:42. Not sure why, do we have enough disk space?

Can you try a microk8s.stop and microk8s.start cycle? This will restart all microk8s services. If this does not help, can you try a sudo systemctl restart snap.microk8s.daemon-kubelet?

MirtoBusico commented 5 years ago

Well, the host have an I7 processor and 32GB RAM The KVM have 8 processors and 24GB RAM The vm have 2 disks

a 100GB disk for the system and microk8s
a 200GB disk formattes as a ZFS filesystem (see note)

df says

sysop@hoseplavm:~$ df -h
File system     Dim. Usati Dispon. Uso% Montato su
udev             12G     0     12G   0% /dev
tmpfs           2,3G  1,3M    2,3G   1% /run
/dev/vda1        98G  9,5G     84G  11% /
tmpfs            12G     0     12G   0% /dev/shm
tmpfs           5,0M  4,0K    5,0M   1% /run/lock
tmpfs            12G     0     12G   0% /sys/fs/cgroup
/dev/loop0       89M   89M       0 100% /snap/core/6964
/dev/loop1       55M   55M       0 100% /snap/lxd/10756
/dev/loop2      208M  208M       0 100% /snap/microk8s/608
tmpfs           2,3G     0    2,3G   0% /run/user/119
tmpfs           1,0M     0    1,0M   0% /var/snap/lxd/common/ns
tmpfs           2,3G   16K    2,3G   1% /run/user/1000
sysop@hoseplavm:~$

zfs says

    sysop@hoseplavm:~$ sudo zfs list -o space
    NAME                                                                                    AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
    zdata1                                                                                   191G  1,48G        0B     24K             0B      1,48G
    zdata1/containers                                                                        191G   201M        0B     24K             0B       201M
    zdata1/containers/kubernetes                                                             191G   201M        0B    201M             0B         0B
    zdata1/custom                                                                            191G    24K        0B     24K             0B         0B
    zdata1/custom-snapshots                                                                  191G    24K        0B     24K             0B         0B
    zdata1/deleted                                                                           191G   655M        0B     24K             0B       655M
    zdata1/deleted/images                                                                    191G   655M        0B     24K             0B       655M
    zdata1/deleted/images/3c09483ccd69f33a4819532c103f482f219ae4591cc0d860dfb94193e97a2627   191G   655M        0B    655M             0B         0B
    zdata1/images                                                                            191G   655M        0B     24K             0B       655M
    zdata1/images/c234ecee3baaee25db84af8e3565347e948bfceb3bf7c820bb1ce95adcffeaa8           191G   655M        0B    655M             0B         0B
    zdata1/snapshots                                                                         191G    24K        0B     24K             0B         0B
    sysop@hoseplavm:~$

Journalctl was taken after some time respect the tarball. I suppose that accounts for the difference

The cycle microk8s.stop / microk8s.start ended with the same error The sudo systemctl restart snap.microk8s.daemon-kubelet command starter the kubelet; bu the failed again. Here the report

inspection-report-20190606_143558.tar.gz

NOTE: originally I tried to install microk8s inside an LXD container but microk8s never started; maybe I'l open another issue on this

MirtoBusico commented 5 years ago

Sorry. Closed by error.

ktsakalozos commented 5 years ago

I do not have a solution for now. I see etcd complaining that reading operations take too long to complete eg:

giu 06 14:35:44 hoseplavm etcd[25682]: request "header:<ID:7587838787223751041 > txn:<compare:<target:MOD key:\"/registry/events/default/hoseplavm.15a59c285a3d131c\" mod_revision:0 > success:<request_put:<key:\"/registry/events/default/hoseplavm.15a59c285a3d131c\" value:\"k8s\\000\\n\\013\\n\\002v1\\022\\005Event\\022\\367\\001\\n_\\n\\032hoseplavm.15a59c285a3d131c\\022\\000\\032\\007default\\\"\\000*$99ed044a-8857-11e9-ad1f-525400b6cf612\\0008\\000B\\010\\010\\237\\221\\344\\347\\005\\020\\000z\\000\\022$\\n\\004Node\\022\\000\\032\\thoseplavm\\\"\\thoseplavm*\\0002\\000:\\000\\032\\tNodeReady\\\"'Node hoseplavm status is now: NodeReady*\\024\\n\\007kubelet\\022\\thoseplavm2\\010\\010\\237\\221\\344\\347\\005\\020\\000:\\010\\010\\237\\221\\344\\347\\005\\020\\000@\\001J\\006NormalR\\000b\\000r\\000z\\000\\032\\000\\\"\\000\" lease:7587838787223751013 > > > " took too long (183.182213ms) to execute
giu 06 14:35:44 hoseplavm etcd[25682]: read-only range request "key:\"/registry/pods/kube-system/kubernetes-dashboard-6fd7f9c494-fwgk7\" " took too long (344.026827ms) to execute
giu 06 14:35:44 hoseplavm etcd[25682]: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" " took too long (258.434407ms) to execute
giu 06 14:35:44 hoseplavm etcd[25682]: read-only range request "key:\"/registry/csidrivers\" range_end:\"/registry/csidrivert\" count_only:true " took too long (289.23571ms) to execute
giu 06 14:35:46 hoseplavm etcd[25682]: request "header:<ID:7587838787223751067 > txn:<compare:<target:MOD key:\"/registry/events/kube-system/kube-dns-6bfbdd666c-stb78.15a59c28f029cbca\" mod_revision:0 > success:<request_put:<key:\"/registry/events/kube-system/kube-dns-6bfbdd666c-stb78.15a59c28f029cbca\" value:\"k8s\\000\\n\\013\\n\\002v1\\022\\005Event\\022\\230\\003\\ns\\n*kube-dns-6bfbdd666c-stb78.15a59c28f029cbca\\022\\000\\032\\013kube-system\\\"\\000*$9b06f58e-8857-11e9-ad1f-525400b6cf612\\0008\\000B\\010\\010\\241\\221\\344\\347\\005\\020\\000z\\000\\022x\\n\\003Pod\\022\\013kube-system\\032\\031kube-dns-6bfbdd666c-stb78\\\"$2902cc4c-87bc-11e9-80f0-525400b6cf61*\\002v12\\00517273:\\030spec.containers{kubedns}\\032\\006Pulled\\\"cContainer image \\\"gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7\\\" already present on machine*\\024\\n\\007kubelet\\022\\thoseplavm2\\010\\010\\241\\221\\344\\347\\005\\020\\000:\\010\\010\\241\\221\\344\\347\\005\\020\\000@\\001J\\006NormalR\\000b\\000r\\000z\\000\\032\\000\\\"\\000\" lease:7587838787223751013 > > > " took too long (309.552297ms) to execute
giu 06 14:35:49 hoseplavm etcd[25682]: read-only range request "key:\"/registry/ranges/serviceips\" " took too long (341.163051ms) to execute
giu 06 14:35:49 hoseplavm etcd[25682]: read-only range request "key:\"/registry/cronjobs/\" range_end:\"/registry/cronjobs0\" limit:500 " took too long (341.160855ms) to execute
giu 06 14:35:49 hoseplavm microk8s.daemon-etcd[25682]: WARNING: 2019/06/06 14:35:49 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp: lookup etcd.socket on 127.0.0.53:53: no such host"; Reconnecting to {etcd.socket:2379 0  <nil>}
giu 06 14:35:50 hoseplavm etcd[25682]: request "header:<ID:7587838787223751214 > txn:<compare:<target:MOD key:\"/registry/events/default/hoseplavm.15a59c29225b988f\" mod_revision:0 > success:<request_put:<key:\"/registry/events/default/hoseplavm.15a59c29225b988f\" value:\"k8s\\000\\n\\013\\n\\002v1\\022\\005Event\\022\\346\\001\\n_\\n\\032hoseplavm.15a59c29225b988f\\022\\000\\032\\007default\\\"\\000*$9d335a7c-8857-11e9-be03-525400b6cf612\\0008\\000B\\010\\010\\245\\221\\344\\347\\005\\020\\000z\\000\\022$\\n\\004Node\\022\\000\\032\\thoseplavm\\\"\\thoseplavm*\\0002\\000:\\000\\032\\010Starting\\\"\\024Starting kube-proxy.*\\027\\n\\nkube-proxy\\022\\thoseplavm2\\010\\010\\242\\221\\344\\347\\005\\020\\000:\\010\\010\\242\\221\\344\\347\\005\\020\\000@\\001J\\006NormalR\\000b\\000r\\000z\\000\\032\\000\\\"\\000\" lease:7587838787223751209 > > > " took too long (148.499105ms) to execute
giu 06 14:35:50 hoseplavm etcd[25682]: read-only range request "key:\"/registry/priorityclasses/system-node-critical\" " took too long (106.843838ms) to execute
giu 06 14:35:50 hoseplavm etcd[25682]: read-only range request "key:\"/registry/namespaces/kube-system\" " took too long (106.90726ms) to execute
giu 06 14:35:50 hoseplavm etcd[25682]: read-only range request "key:\"/registry/services/specs/\" range_end:\"/registry/services/specs0\" " took too long (753.748779ms) to execute
giu 06 14:35:50 hoseplavm etcd[25682]: request "header:<ID:7587838787223751238 > txn:<compare:<target:MOD key:\"/registry/masterleases/192.168.202.10\" mod_revision:0 > success:<request_put:<key:\"/registry/masterleases/192.168.202.10\" value:\"k8s\\000\\n\\017\\n\\002v1\\022\\tEndpoints\\022*\\n\\022\\n\\000\\022\\000\\032\\000\\\"\\000*\\0002\\0008\\001B\\000z\\000\\022\\024\\n\\022\\n\\016192.168.202.10\\032\\000\\032\\000\\\"\\000\" lease:7587838787223751236 > > failure:<request_range:<key:\"/registry/masterleases/192.168.202.10\" > > > " took too long (451.566414ms) to execute
giu 06 14:35:54 hoseplavm etcd[25682]: request "header:<ID:7587838787223751257 > txn:<compare:<target:MOD key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" mod_revision:22483 > success:<request_put:<key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" value:\"k8s\\000\\n\\017\\n\\002v1\\022\\tEndpoints\\022\\316\\002\\n\\313\\002\\n\\027kube-controller-manager\\022\\000\\032\\013kube-system\\\"\\000*$778d6e74-87bb-11e9-80f0-525400b6cf612\\0008\\000B\\010\\010\\254\\205\\340\\347\\005\\020\\000b\\350\\001\\n(control-plane.alpha.kubernetes.io/leader\\022\\273\\001{\\\"holderIdentity\\\":\\\"hoseplavm_f32aa734-8856-11e9-b992-525400b6cf61\\\",\\\"leaseDurationSeconds\\\":15,\\\"acquireTime\\\":\\\"2019-06-06T12:31:25Z\\\",\\\"renewTime\\\":\\\"2019-06-06T12:35:53Z\\\",\\\"leaderTransitions\\\":4}z\\000\\032\\000\\\"\\000\" > > failure:<request_range:<key:\"/registry/services/endpoints/kube-system/kube-controller-manager\" > > > " took too long (137.663008ms) to execute
giu 06 14:35:54 hoseplavm etcd[25682]: read-only range request "key:\"/registry/pods/kube-system/heapster-v1.5.2-6b5d7b57f9-pf9jr\" " took too long (741.166459ms) to execute

CPU and disk utilization look healthy?

For LXD we use a few profiles (not recommended as they break the isolation) https://github.com/ubuntu/microk8s/tree/master/tests/lxc

MirtoBusico commented 5 years ago

I retried a start from a powered off vm. I'didn't see much disk activity on the host. VDA1 is the only partition on the vm that shows activity; so I prepared the graph below Screenshot_20190606_184530

AFAIK sems normal. The report is inspection-report-20190606_184438.tar.gz

I'll do another test with 4 processors instead of 8 to reduce parallelism.

(Ok, for now I'll don't try LXD)

MirtoBusico commented 5 years ago

Well also with 4 processors the result is the same.

Tried to reset but the command never ends and the consolo log is

sysop@hoseplavm:~/Immagini$ microk8s.reset
Calling clean_cluster
Cleaning resources in namespace default
endpoints "kubernetes" deleted
event "hoseplavm.15a5a8f07c462d2f" deleted
event "hoseplavm.15a5a8f13b943b52" deleted
event "hoseplavm.15a5a8f13bd2568d" deleted
event "hoseplavm.15a5a8f1410d1bb0" deleted
event "hoseplavm.15a5a8f1410d3c66" deleted
event "hoseplavm.15a5a8f1410d4bab" deleted
event "hoseplavm.15a5a8f14124aa79" deleted
event "hoseplavm.15a5a8f170909b85" deleted
event "hoseplavm.15a5a8f62b369d16" deleted
event "hoseplavm.15a5a9b388013d8e" deleted
event "hoseplavm.15a5a9b95b13b907" deleted
event "hoseplavm.15a5a9c2ab940dab" deleted
event "hoseplavm.15a5ab3f0b19a239" deleted                                                                                                                           
event "hoseplavm.15a5ab44d3096c15" deleted                                                                                                                           
event "hoseplavm.15a5ab8f4ed766af" deleted                                                                                                                           
event "hoseplavm.15a5ab94c86afd90" deleted                                                                                                                           
secret "default-token-j7gsr" deleted                                                                                                                                 
serviceaccount "default" deleted                                                                                                                                     
service "kubernetes" deleted
Cleaning resources in namespace kube-node-lease
secret "default-token-vflrh" deleted
serviceaccount "default" deleted
lease.coordination.k8s.io "hoseplavm" deleted
Cleaning resources in namespace kube-public
secret "default-token-6xfpz" deleted
serviceaccount "default" deleted
Cleaning resources in namespace kube-system
configmap "eventer-config" deleted
configmap "extension-apiserver-authentication" deleted
configmap "heapster-config" deleted
configmap "kube-dns" deleted
configmap "kubernetes-dashboard-settings" deleted
endpoints "heapster" deleted
endpoints "kube-controller-manager" deleted
endpoints "kube-dns" deleted
endpoints "kube-scheduler" deleted
endpoints "kubernetes-dashboard" deleted
endpoints "monitoring-grafana" deleted
endpoints "monitoring-influxdb" deleted
event "heapster-v1.5.2-6b5d7b57f9-pf9jr.15a5a8f1d7ff6975" deleted
event "heapster-v1.5.2-6b5d7b57f9-pf9jr.15a5a8f23005a4dc" deleted
event "heapster-v1.5.2-6b5d7b57f9-pf9jr.15a5a8f4ca1d2cce" deleted
event "heapster-v1.5.2-6b5d7b57f9-pf9jr.15a5a8f4f1fdff3c" deleted
event "heapster-v1.5.2-6b5d7b57f9-pf9jr.15a5a8f4f206f594" deleted
event "heapster-v1.5.2-6b5d7b57f9-pf9jr.15a5a8f788bdc563" deleted
event "heapster-v1.5.2-6b5d7b57f9-pf9jr.15a5a8f7c9acb807" deleted
event "heapster-v1.5.2-6b5d7b57f9-pf9jr.15a5a8f7c9c0d436" deleted
event "heapster-v1.5.2-6b5d7b57f9-pf9jr.15a5a8fa07a7cbbc" deleted
event "heapster-v1.5.2-6b5d7b57f9-pf9jr.15a5a8fa2c780598" deleted
event "heapster-v1.5.2-6b5d7b57f9-pf9jr.15a5a8fa2c8304f2" deleted
event "heapster-v1.5.2-6b5d7b57f9-pf9jr.15a5a8fb1d69d331" deleted
event "heapster-v1.5.2-6b5d7b57f9-pf9jr.15a5a8fb385b4813" deleted
event "kube-controller-manager.15a5a8f5a441a273" deleted
event "kube-controller-manager.15a5a9b84c19c2f3" deleted
event "kube-controller-manager.15a5ab44724d4719" deleted
event "kube-controller-manager.15a5ab945e6803d2" deleted
event "kube-dns-6bfbdd666c-stb78.15a5a8f1704f2e60" deleted
event "kube-dns-6bfbdd666c-stb78.15a5a8f4e4da5ef1" deleted
event "kube-dns-6bfbdd666c-stb78.15a5a8f73e9b50a7" deleted
event "kube-dns-6bfbdd666c-stb78.15a5a8f7a0e94351" deleted
event "kube-dns-6bfbdd666c-stb78.15a5a8f7a0f672d3" deleted
event "kube-dns-6bfbdd666c-stb78.15a5a8fa2819a5ba" deleted
event "kube-dns-6bfbdd666c-stb78.15a5a8fa5a15a506" deleted
event "kube-dns-6bfbdd666c-stb78.15a5a8fa5a22d5c1" deleted
event "kube-dns-6bfbdd666c-stb78.15a5a8fb283a5452" deleted
event "kube-dns-6bfbdd666c-stb78.15a5a8fb3cb2bca9" deleted
event "kube-scheduler.15a5a8f5a44249d6" deleted
event "kube-scheduler.15a5a9b9b7c60913" deleted
event "kube-scheduler.15a5ab44118e2536" deleted
event "kube-scheduler.15a5ab94b114c701" deleted
event "kubernetes-dashboard-6fd7f9c494-fwgk7.15a5a8f187f43914" deleted
event "kubernetes-dashboard-6fd7f9c494-fwgk7.15a5a8f2221c6fb9" deleted
event "kubernetes-dashboard-6fd7f9c494-fwgk7.15a5a8f4bce6d531" deleted
event "kubernetes-dashboard-6fd7f9c494-fwgk7.15a5a8f4e2f3a18b" deleted
event "monitoring-influxdb-grafana-v4-78777c64c8-29vrc.15a5a8f18635b7c7" deleted
event "monitoring-influxdb-grafana-v4-78777c64c8-29vrc.15a5a8f228ee5941" deleted
event "monitoring-influxdb-grafana-v4-78777c64c8-29vrc.15a5a8f4c1fab562" deleted
event "monitoring-influxdb-grafana-v4-78777c64c8-29vrc.15a5a8f4e7c3e335" deleted
event "monitoring-influxdb-grafana-v4-78777c64c8-29vrc.15a5a8f4e7ce08db" deleted
event "monitoring-influxdb-grafana-v4-78777c64c8-29vrc.15a5a8f7b0017ab4" deleted
event "monitoring-influxdb-grafana-v4-78777c64c8-29vrc.15a5a8f7fa56e2d0" deleted
pod "heapster-v1.5.2-6b5d7b57f9-pf9jr" deleted
pod "kube-dns-6bfbdd666c-stb78" deleted
pod "kubernetes-dashboard-6fd7f9c494-fwgk7" deleted
pod "monitoring-influxdb-grafana-v4-78777c64c8-29vrc" deleted
secret "default-token-bwthm" deleted
secret "heapster-token-lskt7" deleted
secret "kube-dns-token-rtssh" deleted
secret "kubernetes-dashboard-certs" deleted
secret "kubernetes-dashboard-key-holder" deleted
secret "kubernetes-dashboard-token-5s9tw" deleted
serviceaccount "default" deleted
serviceaccount "heapster" deleted
serviceaccount "kube-dns" deleted
serviceaccount "kubernetes-dashboard" deleted
service "heapster" deleted
service "kube-dns" deleted
service "kubernetes-dashboard" deleted
service "monitoring-grafana" deleted
service "monitoring-influxdb" deleted
deployment.apps "heapster-v1.5.2" deleted
deployment.apps "kube-dns" deleted
deployment.apps "kubernetes-dashboard" deleted
deployment.apps "monitoring-influxdb-grafana-v4" deleted
event.events.k8s.io "heapster-v1.5.2-6b5d7b57f9-58r6x.15a5abb4020ea2a2" deleted
event.events.k8s.io "heapster-v1.5.2-6b5d7b57f9-58r6x.15a5abb9e8c1341d" deleted
event.events.k8s.io "heapster-v1.5.2-6b5d7b57f9.15a5abb402114b94" deleted
event.events.k8s.io "kube-dns-6bfbdd666c-rcppl.15a5abb45ba14c2e" deleted
event.events.k8s.io "kube-dns-6bfbdd666c-rcppl.15a5abb9e8ba4648" deleted
event.events.k8s.io "kube-dns-6bfbdd666c.15a5abb426fd32c6" deleted
event.events.k8s.io "kubernetes-dashboard-6fd7f9c494-v5m7v.15a5abb494d2790d" deleted
event.events.k8s.io "kubernetes-dashboard-6fd7f9c494-v5m7v.15a5abba3c36b325" deleted
event.events.k8s.io "kubernetes-dashboard-6fd7f9c494.15a5abb4796c3268" deleted
event.events.k8s.io "monitoring-influxdb-grafana-v4-78777c64c8-vdfzp.15a5abb4a4f4b552" deleted
event.events.k8s.io "monitoring-influxdb-grafana-v4-78777c64c8-vdfzp.15a5abba5f3f67fe" deleted
event.events.k8s.io "monitoring-influxdb-grafana-v4-78777c64c8.15a5abb4a0fdd090" deleted

After this nothing happens and I see many processes related to k8

sysop@hoseplavm:~$ ps -elf|grep k8
4 S root      1167     1  0  80   0 - 2648267 -    19:17 ?        00:00:20 /snap/microk8s/608/etcd --data-dir=/var/snap/microk8s/common/var/run/etcd --advertise-client-urls=unix://etcd.socket:2379 --listen-client-urls=unix://etcd.socket:2379
4 S root      1170     1  0  80   0 -  5370 -      19:17 ?        00:00:01 /bin/bash /snap/microk8s/608/apiservice-kicker
4 S root      1173     1  1  80   0 - 54456 -      19:17 ?        00:00:27 /snap/microk8s/608/kube-controller-manager --master=http://127.0.0.1:8080 --service-account-private-key-file=/var/snap/microk8s/608/certs/serviceaccount.key --root-ca-file=/var/snap/microk8s/608/certs/ca.crt --cluster-signing-cert-file=/var/snap/microk8s/608/certs/ca.crt --cluster-signing-key-file=/var/snap/microk8s/608/certs/ca.key --address=127.0.0.1
4 S root      1182     1  0  80   0 - 35761 -      19:17 ?        00:00:05 /snap/microk8s/608/kube-scheduler --master=http://127.0.0.1:8080 --address=127.0.0.1
4 S root      2206     1  1  80   0 - 101745 -     19:17 ?        00:00:41 /snap/microk8s/608/kube-apiserver --insecure-bind-address=127.0.0.1 --cert-dir=/var/snap/microk8s/608/certs --etcd-servers=unix://etcd.socket:2379 --service-cluster-ip-range=10.152.183.0/24 --authorization-mode=AlwaysAllow --basic-auth-file=/var/snap/microk8s/608/credentials/basic_auth.csv --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota --service-account-key-file=/var/snap/microk8s/608/certs/serviceaccount.key --client-ca-file=/var/snap/microk8s/608/certs/ca.crt --tls-cert-file=/var/snap/microk8s/608/certs/server.crt --tls-private-key-file=/var/snap/microk8s/608/certs/server.key --kubelet-client-certificate=/var/snap/microk8s/608/certs/server.crt --kubelet-client-key=/var/snap/microk8s/608/certs/server.key --secure-port=16443 --insecure-port=8080 --requestheader-client-ca-file=/var/snap/microk8s/608/certs/ca.crt
4 S root      2221     1  0  80   0 - 35003 -      19:17 ?        00:00:01 /snap/microk8s/608/kube-proxy --master=http://127.0.0.1:8080 --cluster-cidr=10.152.183.0/24 --kubeconfig=/snap/microk8s/608/kubeproxy.config --proxy-mode=userspace --healthz-bind-address=127.0.0.1
4 S sysop     6078  4376  0  80   0 -  3257 wait   19:19 pts/1    00:00:00 /bin/bash /snap/microk8s/608/microk8s-reset.wrapper
4 S root      6108     1  0  80   0 - 212987 -     19:19 ?        00:00:02 /snap/microk8s/608/bin/containerd --config /var/snap/microk8s/608/args/containerd.toml --root /var/snap/microk8s/common/var/lib/containerd --state /var/snap/microk8s/common/run/containerd --address /var/snap/microk8s/common/run/containerd.sock
0 S sysop     6587  6078  0  80   0 - 36703 futex_ 19:20 pts/1    00:00:01 /snap/microk8s/608/kubectl --kubeconfig=/snap/microk8s/608/client.config delete --all configmaps,endpoints,events,limitranges,persistentvolumeclaims,pods,podtemplates,replicationcontrollers,resourcequotas,secrets,serviceaccounts,services,controllerrevisions.apps,daemonsets.apps,deployments.apps,replicasets.apps,statefulsets.apps,horizontalpodautoscalers.autoscaling,cronjobs.batch,jobs.batch,leases.coordination.k8s.io,events.events.k8s.io,daemonsets.extensions,deployments.extensions,ingresses.extensions,networkpolicies.extensions,replicasets.extensions,ingresses.networking.k8s.io,networkpolicies.networking.k8s.io,poddisruptionbudgets.policy,rolebindings.rbac.authorization.k8s.io,roles.rbac.authorization.k8s.io --namespace=kube-system
0 R sysop    14164  4352  0  80   0 -  3609 -      19:52 pts/0    00:00:00 grep --color=auto k8
sysop@hoseplavm:~$

There are other things I can try?

ktsakalozos commented 5 years ago

Since storage is ok and I still see etcd complaining it might be some kind of corruption in the data store. Unfortunately I can only think of a removal and re installation.

MirtoBusico commented 5 years ago

Tried and failed again. Step to reproduce:

start from a clean state of the vm (I use KVM snapshots)
install microk8s; verify that everything works; do a shutdown powering off the machine
start the machine; verify that everything works; do a shutdown powering off the machine
start the machine for a second time; now you can see the failure

Now I will do different tentatives changing one thing at time. I'll report asap

MirtoBusico commented 5 years ago

First try: disable swap in VM and do not install any addon. SUCCES survived to reboots

I see this

sysop@hoseplavm:~$ microk8s.inspect
[sudo] password di sysop: 
Inspecting services
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Service snap.microk8s.daemon-etcd is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system info
  Copy network configuration to the final report tarball
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Inspect kubernetes cluster

Building the report tarball
  Report tarball is at /var/snap/microk8s/608/inspection-report-20190610_120920.tar.gz
sysop@hoseplavm:~$ sudo cat /proc/swaps
Filename                                Type            Size    Used    Priority
sysop@hoseplavm:~$ 

sysop@hoseplavm:~$ microk8s.status
microk8s is running
addons:
jaeger: disabled
fluentd: disabled
gpu: disabled
storage: disabled
registry: disabled
rbac: disabled
ingress: disabled
dns: disabled
metrics-server: disabled
linkerd: disabled
prometheus: disabled
istio: disabled
dashboard: disabled
sysop@hoseplavm:~$

Now I'll try to enable dns and dashboard (a report asap)

MirtoBusico commented 5 years ago

Sadly I got the same error after the second reboot.

Inspect says

sysop@hoseplavm:~$ microk8s.inspect
[sudo] password di sysop: 
Inspecting services
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-proxy is running
 FAIL:  Service snap.microk8s.daemon-kubelet is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-kubelet
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Service snap.microk8s.daemon-etcd is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system info
  Copy network configuration to the final report tarball
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Inspect kubernetes cluster

Building the report tarball
  Report tarball is at /var/snap/microk8s/608/inspection-report-20190610_162916.tar.gz
sysop@hoseplavm:~$

Attached the report inspection-report-20190610_162916.tar.gz

BTW when I installed microk8s, after three hours there was a lot of I/O activity related to etcd. Is this considered normal?

Screenshot_20190610_160940

ktsakalozos commented 5 years ago

It is strange that you have two etcd processes running.

Would it be possible to create a VM in a non-zfs substrate? I cannot reproduce and I would like to check that zfs is not an issue.

MirtoBusico commented 5 years ago

Ok. I'll report asap the machine is ready

MirtoBusico commented 5 years ago

I prepared the new machine:

8 processors
20Gb RAM
1 virtio 300Gb hdd formatted as ext4
Kubuntu 18.04.2 updated today
1 lan card with fixed IP address on the network that the host MASQUERADE to access Internet and download updates

What is missed from th other machine:

1 lan card with static IP address (used to NAT for external access)
1 lan card with static IP address (used to communicate with other VM)
1 hdd formatted by ZFS
lxd installed and initialized
webmin
openssh-server

Now I'll take a snapshot and proceed with the tests

BTW do you have any reference definition for creating a VM to use?

MirtoBusico commented 5 years ago

Well, microk8s witout addons survives two reboots. The etcd processes are always 2

sysop@hoseplavm1:~$ microk8s.inspect
[sudo] password for sysop: 
Inspecting services
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-proxy is running
  Service snap.microk8s.daemon-kubelet is running
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Service snap.microk8s.daemon-etcd is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system info
  Copy network configuration to the final report tarball
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Inspect kubernetes cluster

Building the report tarball
  Report tarball is at /var/snap/microk8s/608/inspection-report-20190610_201053.tar.gz
sysop@hoseplavm1:~$ ps -elf|grep etcd
4 S root       795     1  8  80   0 - 101553 -     20:08 ?        00:00:10 /snap/microk8s/608/kube-apiserver --insecure-bind-address=127.0.0.1 --cert-dir=/var/snap/microk8s/608/certs --etcd-servers=unix://etcd.socket:2379 --service-cluster-ip-range=10.152.183.0/24 --authorization-mode=AlwaysAllow --basic-auth-file=/var/snap/microk8s/608/credentials/basic_auth.csv --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota --service-account-key-file=/var/snap/microk8s/608/certs/serviceaccount.key --client-ca-file=/var/snap/microk8s/608/certs/ca.crt --tls-cert-file=/var/snap/microk8s/608/certs/server.crt --tls-private-key-file=/var/snap/microk8s/608/certs/server.key --kubelet-client-certificate=/var/snap/microk8s/608/certs/server.crt --kubelet-client-key=/var/snap/microk8s/608/certs/server.key --secure-port=16443 --insecure-port=8080 --requestheader-client-ca-file=/var/snap/microk8s/608/certs/ca.crt
4 S root       797     1  2  80   0 - 2634539 -    20:08 ?        00:00:03 /snap/microk8s/608/etcd --data-dir=/var/snap/microk8s/common/var/run/etcd --advertise-client-urls=unix://etcd.socket:2379 --listen-client-urls=unix://etcd.socket:2379
0 S sysop     2679  2228  0  80   0 -  3608 pipe_w 20:10 pts/1    00:00:00 grep --color=auto etcd
sysop@hoseplavm1:~$

Now I'll install dns and dashboard addon and repeat the test

MirtoBusico commented 5 years ago

FAILED at the second reboot After I installed dns and dashboard addons the problem appears again.

sysop@hoseplavm1:~$ microk8s.inspect
[sudo] password for sysop: 
Inspecting services
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-apiserver is running
  Service snap.microk8s.daemon-proxy is running
 FAIL:  Service snap.microk8s.daemon-kubelet is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-kubelet
  Service snap.microk8s.daemon-scheduler is running
  Service snap.microk8s.daemon-controller-manager is running
  Service snap.microk8s.daemon-etcd is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system info
  Copy network configuration to the final report tarball
  Copy processes list to the final report tarball
  Copy snap list to the final report tarball
  Inspect kubernetes cluster

Building the report tarball
  Report tarball is at /var/snap/microk8s/608/inspection-report-20190610_204026.tar.gz
sysop@hoseplavm1:~$ rcp /var/snap/microk8s/608/inspection-report-20190610_204026.tar.gz mirto@192.168.201.1://home/mirto
mirto@192.168.201.1's password: 
inspection-report-20190610_204026.tar.gz                                                                                                                                                                  100%  335KB  30.3MB/s   00:00    
sysop@hoseplavm1:~$ ps -elf|grep etcd
4 S root       798     1  1  80   0 - 2635724 -    20:27 ?        00:00:14 /snap/microk8s/608/etcd --data-dir=/var/snap/microk8s/common/var/run/etcd --advertise-client-urls=unix://etcd.socket:2379 --listen-client-urls=unix://etcd.socket:2379
4 S root      2076     1  2  80   0 - 101553 -     20:27 ?        00:00:24 /snap/microk8s/608/kube-apiserver --insecure-bind-address=127.0.0.1 --cert-dir=/var/snap/microk8s/608/certs --etcd-servers=unix://etcd.socket:2379 --service-cluster-ip-range=10.152.183.0/24 --authorization-mode=AlwaysAllow --basic-auth-file=/var/snap/microk8s/608/credentials/basic_auth.csv --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota --service-account-key-file=/var/snap/microk8s/608/certs/serviceaccount.key --client-ca-file=/var/snap/microk8s/608/certs/ca.crt --tls-cert-file=/var/snap/microk8s/608/certs/server.crt --tls-private-key-file=/var/snap/microk8s/608/certs/server.key --kubelet-client-certificate=/var/snap/microk8s/608/certs/server.crt --kubelet-client-key=/var/snap/microk8s/608/certs/server.key --secure-port=16443 --insecure-port=8080 --requestheader-client-ca-file=/var/snap/microk8s/608/certs/ca.crt
0 S sysop     7017  5807  0  80   0 -  3608 pipe_w 20:42 pts/1    00:00:00 grep --color=auto etcd
sysop@hoseplavm1:~$

And here is the report

inspection-report-20190610_204026.tar.gz

What can I try?

ktsakalozos commented 5 years ago

I need to reproduce your setup. Here is what I do, please give the following a try and tell me if you see the issue:

# Create a VM (2 CPUs and 4 GB of RAM). Multipass will take care of qemu and grabbing the right image
multipass launch ubuntu -n testvm -c 2 -m 4G 
# Enter the VM
multipass shell testvm 
# inside the VM install microk8s and enable the addons
> sudo snap install microk8s --classic
# Wait for microk8s to become ready
> microk8s.status --wait-ready
> microk8s.enable dns dashboard
# Wait to see the pods running
> watch microk8s.kubectl get all --all-namespaces
# Exit the VM with ctrl-D and reboot it
multipass stop testvm
multipass start testvm
# Enter the VM
multipass shell testvm 
# Wait for microk8s to become ready
> microk8s.status --wait-ready
# Wait to see the pods running
> watch microk8s.kubectl get all --all-namespaces

Could you also please share the scripts you have to create the VMs? Thank you

MirtoBusico commented 5 years ago

Well, my notebook is my home/office; so I cannot risk to corrupt it. I hope that executing the commands in the VM above described is fine.

[Where can I find the command/package multipass?] Update: I suppose you mean the snap package

About VM scripts: I don't use script because I use the GUI virtual-manager. If it can be useful I can share /var/lib/libvirt (obviously excluding images and snapshots) I'll try and report.

MirtoBusico commented 5 years ago

Started with

sysop@hoseplavm1:~$ sudo snap install multipass --beta --classic
2019-06-11T12:41:47+02:00 INFO Waiting for restart...
multipass (beta) 0.7.0 from Canonical✓ installed
sysop@hoseplavm1:~$ multipass launch ubuntu -n testvm -c 2 -m 4G
launch failed: CPU does not support KVM extensions.                             
sysop@hoseplavm1:~$

So I installed sudo apt-get install qemu-kvm libvirt-bin ubuntu-vm-builder bridge-utils

Not sufficient:

sysop@hoseplavm1:~$ multipass launch ubuntu -n testvm -c 2 -m 4G                            
One quick question before we launch … Would you like to help                    
the Multipass developers, by sending anonymous usage data?
This includes your operating system, which images you use,
the number of instances, their properties and how long you use them.
We’d also like to measure Multipass’s speed.

Send usage data (yes/no/Later)? yes
Thank you!
launch failed: CPU does not support KVM extensions.                             
sysop@hoseplavm1:~$ egrep -c '(vmx|svm)' /proc/cpuinfo                          
0
sysop@hoseplavm1:~$ kvm-ok 
INFO: Your CPU does not support KVM extensions
INFO: For more detailed results, you should run this as root
HINT:   sudo /usr/sbin/kvm-ok
sysop@hoseplavm1:~$ sudo kvm-ok 
[sudo] password for sysop: 
INFO: Your CPU does not support KVM extensions
KVM acceleration can NOT be used
sysop@hoseplavm1:~$ sudo apt install virt-manager

Trying to use virt-manager

sysop@hoseplavm1:~$ virsh list
 Id    Name                           State
----------------------------------------------------
 1     generic                        running

sysop@hoseplavm1:~$ multipass launch ubuntu -n testvm -c 2 -m 4G
launch failed: CPU does not support KVM extensions.                             
sysop@hoseplavm1:~$

Fails again. As you can see virt-manager was able to create a vm; but multipass complains that there is no hardware acceleration.

I suppose you can use multipass only on bare metal.

BTW if you can say what kernel is used in your configuration, I can try to setup a VM with this kernel

ktsakalozos commented 5 years ago

@MirtoBusico I am on 4.15.0-50-generic #54-Ubuntu SMP Mon May 6 18:46:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Can you lead me though your setup process? How can I create the same (kind) of machines you are using for microk8s? Please provide as much detail as possible. Thank you.

MirtoBusico commented 5 years ago

@ktsakalozos I am on Linux mirto-P65 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux on the laptop machine and the same on the vm.

I have my step-by-step instruction document; but it is in italian. Please wait and I'll create anenglish version.

MirtoBusico commented 5 years ago

Hi, it is more work than I supposed. The attached manual terminate at the system update. Tomorrow I'll complete the document.

microk8s_base_vm.pdf

MirtoBusico commented 5 years ago

Hi @ktsakalozos another failure at the second reboot. Here the instructions to reproduce microk8s_base_vm_V2.pdf

Here the report inspection-report-20190612_131519.tar.gz

Tell me if you need other details

MirtoBusico commented 5 years ago

Tried also with a VM with ubuntu server 18.04.2 installed Uname says

sysop@hoseplamono:~$ uname -a
Linux hoseplamono 4.15.0-51-generic #55-Ubuntu SMP Wed May 15 14:27:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
sysop@hoseplamono:~$

This fails at the first reboot

ktsakalozos commented 5 years ago

I do not know what to tell you @MirtoBusico I partially followed your setup (minus the network configurations) and all services were coming up with no issues.

MirtoBusico commented 5 years ago

Hi @ktsakalozos I'm starting to suspect some kind of hardware problem.

Please can we compare the hardware we used for the test?

I'm using an old Santech C37 notebook For storage I have two 2Tera hdd Seagate Barracuda The processor is an Intel I7 Ram is 32Gb The video card is an Nvidia using proprietary drivers

What hardware are you using?

Don't know if this can be useful, but I ended with a similar problem (endless install never completing) using a KVM machine (satisfying requirements except using SSD) and trying to install on local LXD

conjure-up kubernetes-core
juju deploy kubernetes-core

ktsakalozos commented 5 years ago

@MirtoBusico I am also on an i7 16GB or ram and an SSD. Is it possible the shutdown you are performing is a force power-off?

MirtoBusico commented 5 years ago

Thanks @ktsakalozos I always use the shutdown command from the VM.

But what you said about network is interesting.

I use static IP (see page 16 of manual) and networkd daemon

If I'm correct the standard is:

use DHCP
use networkmanager

Instead for me:

use static IP
use networkd

I'll try asap to create a VM from scratch with standard configuraion and report the results

MirtoBusico commented 5 years ago

Well, another failure: at the second startup the kubelet fails

Tried:

create a new VM using english language (instead of italian)
used kvm default network (dhcp and nat routing on virbr0)
use automatic network configuration

The netplan definition is:

sysop@testmicrok8s:~$ cat /etc/netplan/01-network-manager-all.yaml                                                                                                                 
# Let NetworkManager manage all devices on this system                                                                                                                             
network:                                                                                                                                                                           
  version: 2                                                                                                                                                                       
  renderer: NetworkManager                                                                                                                                                         
sysop@testmicrok8s:~$

Do not know what else to try.

MirtoBusico commented 5 years ago

SUCCESS The problem disappeared. Installing in a new virtual machine an Ubuntu 18.04.2 and updating the system up to 8 July 2019 the microk8s installation survived 4 reboots and a shutdown / power-on cycle

Every 2.0s: microk8s.kubectl get all --all-namespaces                                                                      k3s-master: Mon Jul  8 12:56:51 2019

NAMESPACE     NAME                                                  READY   STATUS    RESTARTS   AGE
kube-system   pod/coredns-f7867546d-rzg56                           1/1     Running   3          77m
kube-system   pod/heapster-v1.5.2-844b564688-grngn                  4/4     Running   12         72m
kube-system   pod/kubernetes-dashboard-7d75c474bb-mbrjb             1/1     Running   3          76m
kube-system   pod/monitoring-influxdb-grafana-v4-6b6954958c-c7m2x   2/2     Running   6          76m

NAMESPACE     NAME                           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
default       service/kubernetes             ClusterIP   10.152.183.1     <none>        443/TCP                  79m
kube-system   service/heapster               ClusterIP   10.152.183.43    <none>        80/TCP                   76m
kube-system   service/kube-dns               ClusterIP   10.152.183.10    <none>        53/UDP,53/TCP,9153/TCP   77m
kube-system   service/kubernetes-dashboard   ClusterIP   10.152.183.118   <none>        443/TCP                  76m
kube-system   service/monitoring-grafana     ClusterIP   10.152.183.116   <none>        80/TCP                   76m
kube-system   service/monitoring-influxdb    ClusterIP   10.152.183.129   <none>        8083/TCP,8086/TCP        76m

NAMESPACE     NAME                                             READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns                          1/1     1            1           77m
kube-system   deployment.apps/heapster-v1.5.2                  1/1     1            1           76m
kube-system   deployment.apps/kubernetes-dashboard             1/1     1            1           76m
kube-system   deployment.apps/monitoring-influxdb-grafana-v4   1/1     1            1           76m

NAMESPACE     NAME                                                        DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/coredns-f7867546d                           1         1         1       77m
kube-system   replicaset.apps/heapster-v1.5.2-6b794f77c8                  0         0         0       76m
kube-system   replicaset.apps/heapster-v1.5.2-6f5d55456                   0         0         0       73m
kube-system   replicaset.apps/heapster-v1.5.2-844b564688                  1         1         1       72m
kube-system   replicaset.apps/kubernetes-dashboard-7d75c474bb             1         1         1       76m
kube-system   replicaset.apps/monitoring-influxdb-grafana-v4-6b6954958c   1         1         1       76m

Don't know what changed

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

canonical / microk8s

Microk8s: after reboot I have "FAIL: Service snap.microk8s.daemon-kubelet is not running" #496