coreos / coreos-kubernetes

CoreOS Container Linux+Kubernetes documentation & Vagrant installers
https://coreos.com/kubernetes/docs/latest/
Apache License 2.0
1.1k stars 466 forks source link

support for Kubernetes 1.6 in vagrant #863

Open jbw976 opened 7 years ago

jbw976 commented 7 years ago

I tried bumping the Kubernetes version to 1.6.1 in my fork with this commit: https://github.com/jbw976/coreos-kubernetes/commit/f8397385a0eb80bf9707275c820dd18f048183d7

However, the cluster doesn't seem to come up successfully with vagrant up from multi-node/vagrant. Is 1.6.1 expected to work simply by bumping the version number, or is there more work necessary to support 1.6?

It looks like the api server container keeps failing/exiting:

core@c1 ~ $ docker ps -a
CONTAINER ID        IMAGE                                                                                              COMMAND                  CREATED              STATUS                          PORTS               NAMES
33ee03df2ca8        quay.io/coreos/hyperkube@sha256:1c8b4487be52a6df7668135d88b4c375aeeda4d934e34dbf5a8191c96161a8f5   "/hyperkube apiserver"   About a minute ago   Exited (2) About a minute ago                       k8s_kube-apiserver_kube-apiserver-172.17.4.101_kube-system_63ca746f1897c616e533e8a22bc52f25_11
78c8e6698fdb        quay.io/coreos/hyperkube@sha256:1c8b4487be52a6df7668135d88b4c375aeeda4d934e34dbf5a8191c96161a8f5   "/hyperkube proxy --m"   22 minutes ago       Up 22 minutes                                       k8s_kube-proxy_kube-proxy-172.17.4.101_kube-system_3adc2e5909a25a7591be4e34d03a979a_0
236a7318e1e1        quay.io/coreos/hyperkube@sha256:1c8b4487be52a6df7668135d88b4c375aeeda4d934e34dbf5a8191c96161a8f5   "/hyperkube scheduler"   22 minutes ago       Up 22 minutes                                       k8s_kube-scheduler_kube-scheduler-172.17.4.101_kube-system_00f8fdc56c1d255064005c48f70be4ef_0
88168fcf90cc        quay.io/coreos/hyperkube@sha256:1c8b4487be52a6df7668135d88b4c375aeeda4d934e34dbf5a8191c96161a8f5   "/hyperkube controlle"   22 minutes ago       Up 22 minutes                                       k8s_kube-controller-manager_kube-controller-manager-172.17.4.101_kube-system_3904d793c0237421892d0b11d8787f7d_0
ee5e7cd9c687        gcr.io/google_containers/pause-amd64:3.0                                                           "/pause"                 23 minutes ago       Up 23 minutes                                       k8s_POD_kube-controller-manager-172.17.4.101_kube-system_3904d793c0237421892d0b11d8787f7d_0
5099f6e7db56        gcr.io/google_containers/pause-amd64:3.0                                                           "/pause"                 23 minutes ago       Up 23 minutes                                       k8s_POD_kube-proxy-172.17.4.101_kube-system_3adc2e5909a25a7591be4e34d03a979a_0
7b861b49e90d        gcr.io/google_containers/pause-amd64:3.0                                                           "/pause"                 23 minutes ago       Up 23 minutes                                       k8s_POD_kube-apiserver-172.17.4.101_kube-system_63ca746f1897c616e533e8a22bc52f25_0
157ea2d35035        gcr.io/google_containers/pause-amd64:3.0                                                           "/pause"                 23 minutes ago       Up 23 minutes                                       k8s_POD_kube-scheduler-172.17.4.101_kube-system_00f8fdc56c1d255064005c48f70be4ef_0

And here is the entirety of the api server container logs:

core@c1 ~ $ docker logs 33ee03df2ca8
[restful] 2017/04/11 00:28:07 log.go:30: [restful/swagger] listing is available at https://172.17.4.101:443/swaggerapi/
[restful] 2017/04/11 00:28:07 log.go:30: [restful/swagger] https://172.17.4.101:443/swaggerui/ is mapped to folder /swagger-ui/
E0411 00:28:07.959405       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Secret: Get https://localhost:443/api/v1/secrets?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused
E0411 00:28:07.983872       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.Namespace: Get https://localhost:443/api/v1/namespaces?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused
E0411 00:28:07.988497       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.LimitRange: Get https://localhost:443/api/v1/limitranges?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused
E0411 00:28:07.988710       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ServiceAccount: Get https://localhost:443/api/v1/serviceaccounts?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused
E0411 00:28:07.989015       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *storage.StorageClass: Get https://localhost:443/apis/storage.k8s.io/v1beta1/storageclasses?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused
E0411 00:28:07.989238       1 reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/internalversion/factory.go:70: Failed to list *api.ResourceQuota: Get https://localhost:443/api/v1/resourcequotas?resourceVersion=0: dial tcp [::1]:443: getsockopt: connection refused
I0411 00:28:08.075165       1 serve.go:79] Serving securely on 0.0.0.0:443
I0411 00:28:08.075310       1 serve.go:94] Serving insecurely on 127.0.0.1:8080
E0411 00:28:08.190502       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
E0411 00:28:08.235467       1 client_ca_hook.go:58] rpc error: code = 13 desc = transport is closing
E0411 00:28:13.032638       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
E0411 00:28:14.564430       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
I0411 00:28:18.414582       1 trace.go:61] Trace "Create /api/v1/namespaces/kube-system/pods" (started 2017-04-11 00:28:08.403696382 +0000 UTC):
[24.604µs] [24.604µs] About to convert to expected version
[94.481µs] [69.877µs] Conversion done
"Create /api/v1/namespaces/kube-system/pods" [10.010853094s] [10.010758613s] END
E0411 00:28:20.106746       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
E0411 00:28:21.792029       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
E0411 00:28:27.231371       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
E0411 00:28:28.868291       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
E0411 00:28:34.492583       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
E0411 00:28:36.038974       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing
I0411 00:28:36.615233       1 trace.go:61] Trace "Create /api/v1/namespaces/kube-system/pods" (started 2017-04-11 00:28:26.573277232 +0000 UTC):
[30.753µs] [30.753µs] About to convert to expected version
[80.065µs] [49.312µs] Conversion done
"Create /api/v1/namespaces/kube-system/pods" [10.041929421s] [10.041849356s] END
jbw976 commented 7 years ago

@aaronlevy @colhom any thoughts about Kubernetes 1.6+ being supported for vagrant yet? I've been using this repo for my daily dev flow and it'd be great to start using 1.6. Thanks!

jbw976 commented 7 years ago

The root cause for the api server not starting appears to be that k8s 1.6 is using etcd3 while the vagrant cluster starts up an etcd2 node. The error message show below occurs when an etcd3 client connects to a v2 only server (from https://github.com/kubernetes/kubernetes/issues/39710):

E0411 00:28:20.106746       1 status.go:62] apiserver received an error that is not an metav1.Status: rpc error: code = 13 desc = transport is closing

In the interest of trying to update this repo to deploy an etcd3 server, I see that needs to be done via Ignition: https://github.com/coreos/bugs/issues/1877#issuecomment-288485078

However, Ignition doesn't appear to be supported for Vagrant/Virtualbox: https://coreos.com/ignition/docs/latest/supported-platforms.html

Am I correct to conclude k8s 1.6 (with etcd3) in the Vagrant/Virtualbox setup in this repo at https://github.com/coreos/coreos-kubernetes/tree/master/multi-node/vagrant is not possible currently?

Perhaps attempting to use etcd2 still could work, but I'm not sure if that's a good idea going forward: https://github.com/kubernetes/features/blob/master/release-1.6/release-notes-draft.md#internal-storage-layer

jbw976 commented 7 years ago

Just FYI, this solution using k8s 1.6.1 and etcd2 seems to work for us so far: https://github.com/rook/coreos-kubernetes/commit/a3e880bf3c880e4e4551fbba54d21cf6833ffb19

It may be of use to other folks, but the long term upstream solution would be to start up an etcd3 server instead of having k8s api server use etcd2. Hope this helps.

rbjorklin commented 7 years ago

@jbw976 I created a fairly significant pull request that upgrades Kubernetes to 1.7.3 along with all it's dependencies. Feel free to check it out here