DataDog / pupernetes

Spin up a full fledged Kubernetes environment designed for local development & CI
Apache License 2.0
201 stars 18 forks source link

Issue with docker 18.05.0-ce #24

Open CharlyF opened 6 years ago

CharlyF commented 6 years ago

Describe what happened: pupernetes does not start, it is stuck trying to reach the API server.

I0517 19:38:24.926526   16471 systemd.go:27] Status of p8s-kubelet.service job: "done"
I0517 19:38:25.940602   16471 state.go:35] Kubenertes apiserver not ready yet: Get http://127.0.0.1:8080/healthz: dial tcp 127.0.0.1:8080: connect: connection refused

No logs from the containers.

 charly@chk:~/go/src/github.com/DataDog/datadog-agent$ docker ps -a
CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS               NAMES
0cfa5d332ddf        a6dae067c19c                 "/hyperkube apiserve…"   10 seconds ago      Created                                 k8s_kube-apiserver_kube-apiserver-chk_kube-system_9a0c7f5f5ec16ad10dc34c7a94a2650e_10
bd8a68b12ee7        a6dae067c19c                 "/hyperkube apiserve…"   3 minutes ago       Created                                 k8s_kube-apiserver_kube-apiserver-chk_kube-system_9a0c7f5f5ec16ad10dc34c7a94a2650e_9
e07111682edd        k8s.gcr.io/pause-amd64:3.1   "/pause"                 6 minutes ago       Up 6 minutes                            k8s_POD_kube-apiserver-chk_kube-system_9a0c7f5f5ec16ad10dc34c7a94a2650e_0

But inspecting the 0cfa5d332ddf we have:

. "State": { "Status": "created", "Running": false, "Paused": false, "Restarting": false, "OOMKilled": false, "Dead": false, "Pid": 0, "ExitCode": 128, "Error": "linux mounts: Could not find source mount of /home/charly/go/src/github.com/DataDog/pupernetes/dca-cm/secrets", "StartedAt": "0001-01-01T00:00:00Z", "FinishedAt": "0001-01-01T00:00:00Z" },

This started happening after I installed docker-compose. It must have updated /etc/apt/sources.list. I had to tweak it in order to apt-get install docker-ce. I used to run 18.03.0 and it got bumped to:

charly@chk:~/go/src/github.com/DataDog/datadog-agent$ docker version
Client:
 Version:      18.05.0-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   f150324
 Built:        Wed May  9 22:16:34 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.05.0-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   f150324
  Built:        Wed May  9 22:14:43 2018
  OS/Arch:      linux/amd64
  Experimental: false

It seems as though this is a known issue which was merged into master a few days ago: https://github.com/moby/moby/issues/37032.

Updating my sources.list to only one entry: deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable and re-installing docker specifying the version: sudo apt-get install docker-ce=18.03.1~ce-0~ubuntu

pupernetes immediately started.

For tracking purposes:

journalctl -u e2e-kubelet.service -o cat -r

Stopped Hyperkube kubelet for end to end testing.
e2e-kubelet.service: Failed with result 'exit-code'.
e2e-kubelet.service: Main process exited, code=exited, status=1/FAILURE
I0426 12:48:08.233487   75907 docker_server.go:79] Stop docker server
Stopping Hyperkube kubelet for end to end testing...
E0426 12:48:07.812079   75907 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get http://127.0.0.1:8080/a
I0426 12:48:07.810470   75907 reflector.go:240] Listing and watching *v1.Pod from k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47
E0426 12:48:07.808935   75907 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get http://127.0.0.1:8080/api/v1/n
E0426 12:48:07.808008   75907 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get http://127.0.0.1:8080/api/v
I0426 12:48:07.807840   75907 reflector.go:240] Listing and watching *v1.Node from k8s.io/kubernetes/pkg/kubelet/kubelet.go:460
I0426 12:48:07.806841   75907 reflector.go:240] Listing and watching *v1.Service from k8s.io/kubernetes/pkg/kubelet/kubelet.go:451
E0426 12:48:07.765987   75907 mirror_client.go:88] Failed deleting a mirror pod "kube-apiserver-chk_kube-system": Delete http://127.0.0.1:8080/api/v1/nam
I0426 12:48:07.765672   75907 mirror_client.go:85] Deleting a mirror pod "kube-apiserver-chk_kube-system"
E0426 12:48:07.765646   75907 kubelet_volumes.go:140] Orphaned pod "92589d13-4971-11e8-aae4-000c2957b706" found, but volume paths are still present on di
I0426 12:48:07.750360   75907 kubelet.go:1943] SyncLoop (housekeeping)
I0426 12:48:06.884526   75907 kubelet.go:2122] Container runtime status: Runtime Conditions: RuntimeReady=true reason: message:, NetworkReady=true reason
E0426 12:48:06.809749   75907 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get http://127.0.0.1:8080/a
I0426 12:48:06.808605   75907 reflector.go:240] Listing and watching *v1.Pod from k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47
E0426 12:48:06.807317   75907 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get http://127.0.0.1:8080/api/v1/n
I0426 12:48:06.806597   75907 reflector.go:240] Listing and watching *v1.Node from k8s.io/kubernetes/pkg/kubelet/kubelet.go:460
E0426 12:48:06.806232   75907 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get http://127.0.0.1:8080/api/v
I0426 12:48:06.805370   75907 reflector.go:240] Listing and watching *v1.Service from k8s.io/kubernetes/pkg/kubelet/kubelet.go:451
E0426 12:48:06.751461   75907 event.go:209] Unable to write event: 'Post http://127.0.0.1:8080/api/v1/namespaces/kube-system/events: dial tcp 127.0.0.1:8
E0426 12:48:05.806412   75907 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get http://127.0.0.1:8080/a
I0426 12:48:05.805588   75907 reflector.go:240] Listing and watching *v1.Pod from k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47
E0426 12:48:05.805037   75907 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get http://127.0.0.1:8080/api/v1/n
E0426 12:48:05.804931   75907 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get http://127.0.0.1:8080/api/v
I0426 12:48:05.804169   75907 reflector.go:240] Listing and watching *v1.Service from k8s.io/kubernetes/pkg/kubelet/kubelet.go:451
I0426 12:48:05.803194   75907 reflector.go:240] Listing and watching *v1.Node from k8s.io/kubernetes/pkg/kubelet/kubelet.go:460
E0426 12:48:05.759745   75907 mirror_client.go:88] Failed deleting a mirror pod "kube-apiserver-chk_kube-system": Delete http://127.0.0.1:8080/api/v1/nam
I0426 12:48:05.759304   75907 mirror_client.go:85] Deleting a mirror pod "kube-apiserver-chk_kube-system"
E0426 12:48:05.758931   75907 kubelet_volumes.go:140] Orphaned pod "92589d13-4971-11e8-aae4-000c2957b706" found, but volume paths are still present on di
I0426 12:48:05.750522   75907 kubelet.go:1943] SyncLoop (housekeeping)
CharlyF commented 6 years ago

@JulienBalestra this can be solved as the problem is only the Docker version - But it was far from obvious to figure out. Do you think there is a way for us to prevent this kind of situation to occur ? Whitelisting supported versions of docker seems a little too much (although we know it has to be > 18 and exclude 18.05.0).

Maybe we could have a way to log this ?

JulienBalestra commented 6 years ago

@CharlyF this is a very good memo to consider.

We definitely need to attach supported container runtime version to the selected kubelet version.

rootfs commented 6 years ago

FWIW, hit the same issue, using the nightly build works

rpm -Uvh https://download.docker.com/linux/fedora/28/x86_64/nightly/Packages/docker-ce-18.06.0.ce-0.0.dev.git20180609.170747.0.ecac08f.fc28.x86_64.rpm