Closed khrisrichardson closed 6 years ago
I'd expect those log lines to appear ephemerally if the apiserver was briefly unavailable for setting the node annotation. Did you have to restart the agent in order for it to proceed?
Hi @dghubble. I expected the same thing, but tried connecting to the Kubernetes service socket from another pod on the same node and did not have similar issues.
Another factor was the unexpectedly old version of Container Linux (1465.X.X) on the node in question and the number of days the node had been alive (20+). I have since updated the autoscaling groups of all our clusters so they are referencing the latest CL AMI, so am having a little self-doubt.
Since I have updated all the nodes in our fleet and don't feel I collected sufficient evidence to make my case, even though it seemed that the pod in question was in a degraded state, maybe we ought to close this until I reproduce the issue and have ample supporting evidence that a liveness/readiness probe is in order.
Although there is the fact that killing the supposedly degraded pod addressed the issue...
Ok. Yes, if you find the pod ends up stuck in this state, even after the apiserver is available, and requires a restart, please do add an issue with any info you can. At the moment, without more info, I'm content to close this as well.
I observed the following in a number of long running
container-linux-update-agent:v0.4.1
pods.Killing them and spawning replacements opened up communication with the kubernetes service and the nodes were able to update. Perhaps some sort of liveness probe to check the health of the connection to the kubernetes service would be advantageous.
Thanks