kubectl describe on our zookeeper pods occasionally logged Warning Unhealthy 12s (x18 over 3m32s) kubelet Readiness probe errored: rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 1s exceeded: context deadline exceeded. The Kafka cluster was still healthy, but we noticed that on Containerd nodes every such timeout left a process like this one:
(on newer kafka clusters the user is nonroot instead of root)
These processes stayed around forever, so that long lived nodes eventuall hit their thread limit. The reason we hadn't noticed the issue earlier was that, until recently, all our long lived nodes ran Dockerd.
kubectl describe on our zookeeper pods occasionally logged
Warning Unhealthy 12s (x18 over 3m32s) kubelet Readiness probe errored: rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 1s exceeded: context deadline exceeded
. The Kafka cluster was still healthy, but we noticed that on Containerd nodes every such timeout left a process like this one:(on newer kafka clusters the user is nonroot instead of root)
These processes stayed around forever, so that long lived nodes eventuall hit their thread limit. The reason we hadn't noticed the issue earlier was that, until recently, all our long lived nodes ran Dockerd.