Open rodrigobersa opened 6 months ago
Hello @rodrigobersa, I'll do some testing myself to see if I can reproduce this issue and get back to you.
One thing we noticed is that the container that seems to be problematic is in the k8s-io
namespace which means it is not the admin container. I don't think I see anything related to the admin container (though we can't rule out some interaction there).
Can you list the containers running on a host that is in this state?
enter-admin-container
and use sudo sheltie
, then ctr --namespace k8s.io images ls
Image I'm using: bottlerocket-aws-k8s-1.28-x86_64-v1.19.1-c325a08b
What I expected to happen: Scale-in activities should take the same average amount of time either with admin container enabled or disabled.
What actually happened: Scale-in activities is taking more than 5 minutes when the admin container is enabled. If not enable, the scale-in process takes less than 2 minutes.
Apparently there is a once sigterm hits
containerd
,systemd
starts repeatedly trying to deactivate the mount for what seems to be the admin host container without success.How to reproduce the problem: Spin up a Managed Node Group, or Karpenter Nodepool with Bottlerocket family AMI. Enable admin container. Scale-out to any amount of replicas. Scale-in.