Closed joestringer closed 5 years ago
Do you have access to the describe pod
information for the etcd-operator pod. Maybe the pod was not scheduled because of memory pressure. Can you see when the etcd-operator actually started? I noticed some delay at times but never 5 minutes.
@joestringer the E1221 21:28:11.122672 1 streamwatcher.go:109] Unable to decode an event from the watch stream: read tcp 10.10.1.56:49026->10.96.0.1:443: read: connection timed out
is worrisome as it means the connectivity between etcd-operator and kube-apiserver was terminated. This means there was no connectivity between 21:12:21
and 21:28:11
to kube-apiserver which is why etcd-operator didn't start.
Seems like when
cilium-etcd-operator
decides to freshly bootstrap the cluster, it may take several minutes beforeetcd-operator
starts doing anything. Around 21:25 in the logs below,cilium-etcd-operator
decided to restart the etcd cluster, then the first log frometcd-operator
is at 21:28, which is around the time that etcd pods began appearing in the cluster.cilium-etcd-operator logs:
etcd-operator logs:
Observed by manually watching the Cilium
Tests upgrade and downgrade from a Cilium stable image to master
ginkgo test run locally, while watching everything with one terminal runningwatch kubectl get nodes,svc,pods --all-namespaces -o wide
and another terminal digging around in logs to see what different components are doing.