cloudfoundry-incubator / kubo-release

Kubernetes BOSH release
https://www.cloudfoundry.org/container-runtime/
Apache License 2.0
161 stars 76 forks source link

Kubelet fails to start on bosh-lite #363

Open gitstn opened 5 years ago

gitstn commented 5 years ago

What happened:

While deploying Kubo using Kubo-deployment/bin/deploy-cfcr-lite, kubelet failed to start with following error message

Task 41 | 11:38:48 | Updating instance master: master/9161da11-aa5c-46fc-8aec-f9dc3f5b4090 (0) (canary) (00:01:29) Task 41 | 11:40:18 | Updating instance worker: worker/187a97f3-78a7-424c-a73c-765fb64810aa (0) (canary) (00:03:21) L Error: Action Failed get_task: Task 56d27322-891f-4a11-5037-842239858931 result: 1 of 2 post-start scripts failed. Failed Jobs: kubelet. Successful Jobs: bosh-dns. Task 41 | 11:43:39 | Error: Action Failed get_task: Task 56d27322-891f-4a11-5037-842239858931 result: 1 of 2 post-start scripts failed. Failed Jobs: kubelet. Successful Jobs: bosh-dns.

Task 41 Started Wed Oct 30 11:38:25 UTC 2019 Task 41 Finished Wed Oct 30 11:43:39 UTC 2019 Task 41 Duration 00:05:14 Task 41 error

Updating deployment: Expected task '41' to succeed but state is 'error'

Exit code 1


What you expected to happen:

The expectation was that Kubo would be deployed successfully with kubelet running.

How to reproduce it (as minimally and precisely as possible):

  1. Deploy Bosh-lite on virtualbox
  2. Clone Kubo-release
  3. Clone kubo -deployment
  4. From kubo-deployment run bin/deploy-cfcr-lite

Anything else we need to know?:

From the kubelet log file, following line seems to be the issue.

F1030 12:05:12.887447 14621 kubelet.go:1407] Failed to start OOM watcher open /dev/kmsg: no such file or directory

Environment:

Using environment '192.168.50.6' as client 'admin'

Name Release(s) Stemcell(s) Config(s) Team(s)
cfcr bosh-dns/1.15.0 bosh-warden-boshlite-ubuntu-xenial-go_agent/456.30 2 runtime/default -
bpm/1.0.4 1 cloud/default
cfcr-etcd/1.11.1
docker/35.3.4
kubo/0.41.0+dev.1572435471

1 deployments

Succeeded

Using environment '192.168.50.6' as client 'admin'

Namebosh-lite UUID3d3b97df-196c-4d82-bd86-903b6061a6b3 Version270.7.0 (00000000) Director Stemcellubuntu-xenial/456.40 CPIwarden_cpi Featurescompiled_package_cache: disabled config_server: enabled local_dns: enabled power_dns: disabled snapshots: disabled Useradmin

Succeeded

Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:16:51Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"} The connection to the server localhost:8080 was refused - did you specify the right host or port?

generalinterest commented 4 years ago

In my testing, I find the Kubelet would not start with this failure in kubelet.stderr.log

kubelet.go:1407] Failed to start OOM watcher open /dev/kmsg: no such file or directory

It looks like the bosh-stemcell-456.XX-warden-boshlite-ubuntu-xenial-go_agent is missing this kernel support.

As a test, and I would not propose this as a solution....

sudo touch /dev/kmsg

And the kubelet will start.

ramonskie commented 4 years ago

i had the same issue with stemcell bosh-warden-boshlite-ubuntu-xenial-go_agent 621.59 when touched the /dev/kmsg it started immediately.

im wondering if this should be solved in the stemcell or in the deployment at this point