coreos / coreos-kubernetes

CoreOS Container Linux+Kubernetes documentation & Vagrant installers
https://coreos.com/kubernetes/docs/latest/
Apache License 2.0
1.1k stars 465 forks source link

Multi-node: GuaranteedUpdate of /registry/minions/<NODE> failed because of a conflict #898

Open ashwinp opened 7 years ago

ashwinp commented 7 years ago

Issue Details:

Setup details:

kubelet on the worker nodes fails to update the worker node status after claiming to have registered successfully:

kubelet-wrapper[1657]: I0804 16:42:15.216223    1657 kubelet_node_status.go:77] Attempting to register node 172.0.60.57
kubelet-wrapper[1657]: I0804 16:42:15.218882    1657 kubelet_node_status.go:80] Successfully registered node 172.0.60.57
kubelet-wrapper[1657]: E0804 16:42:25.230766    1657 kubelet_node_status.go:326] Error updating node status, will retry: error getting node "172.0.60.57": nodes "172.0.60.57" not found
kubelet-wrapper[1657]: E0804 16:42:25.232449    1657 kubelet_node_status.go:326] Error updating node status, will retry: error getting node "172.0.60.57": nodes "172.0.60.57" not found

Looking at the Kubernetes API server logs reveals the fact that there is a conflict while updating the node in etcd, due to which the API server deletes the node:

I0804 16:42:15.220414       1 wrap.go:75] GET /api/v1/nodes/172.0.60.57: (736.057µs) 200 

[[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:47534]
I0804 16:42:15.227137       1 store.go:329] GuaranteedUpdate of /registry/minions/172.0.60.57 failed because of a conflict, going to retry
I0804 16:42:15.227245       1 store.go:329] GuaranteedUpdate of /registry/minions/172.0.60.57 failed because of a conflict, going to retry

I0804 16:42:15.227280       1 wrap.go:75] GET /api/v1/pods?fieldSelector=spec.nodeName%3D172.0.60.57: (7.793419ms) 200 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:46858]
I0804 16:42:15.227314       1 wrap.go:75] PUT /api/v1/nodes/172.0.60.57: (6.490089ms) 409 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:47534]
I0804 16:42:15.227250       1 wrap.go:75] PATCH /api/v1/nodes/172.0.60.57: (6.805385ms) 200 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/ttl-controller] 127.0.0.1:47536]
I0804 16:42:15.228557       1 wrap.go:75] GET /api/v1/nodes/172.0.60.57: (708.958µs) 200 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:46858]
I0804 16:42:15.228820       1 wrap.go:75] PATCH /api/v1/namespaces/default/events/172.0.60.57.14d7b23385905550: (11.479188ms) 200 [[kubelet/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd] 172.0.60.57:59454]
I0804 16:42:15.228837       1 wrap.go:75] PATCH /api/v1/nodes/172.0.60.57/status: (6.707276ms) 200 [[kubelet/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd] 172.0.60.57:59454]
I0804 16:42:15.229323       1 wrap.go:75] PUT /api/v1/nodes/172.0.60.57: (406.754µs) 409 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:47536]
I0804 16:42:15.230566       1 wrap.go:75] GET /api/v1/nodes/172.0.60.57: (719.769µs) 200 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:47536]
I0804 16:42:15.232358       1 wrap.go:75] PUT /api/v1/nodes/172.0.60.57: (1.469816ms) 200 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:47536]
I0804 16:42:15.232840       1 wrap.go:75] PATCH /api/v1/namespaces/default/events/172.0.60.57.14d7b2338590686a: (3.188002ms) 200 [[kubelet/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd] 172.0.60.57:59454]
I0804 16:42:15.235985       1 wrap.go:75] PATCH /api/v1/namespaces/default/events/172.0.60.57.14d7b23385907c23: (2.451278ms) 200 [[kubelet/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd] 172.0.60.57:59454]

I0804 16:42:17.732567       1 wrap.go:75] DELETE /api/v1/nodes/172.0.60.57: (2.582459ms) 200 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:47534]