CNI v1.12.5 version still mounting dockershim volume

aws / amazon-vpc-cni-k8s

Networking plugin repository for pod networking in Kubernetes using Elastic Network Interfaces on AWS

Apache License 2.0

2.28k stars 742 forks source link

CNI v1.12.5 version still mounting dockershim volume #2316

Closed rlinstorres closed 1 year ago

rlinstorres commented 1 year ago

What happened:

Hi, there!

By default, we are using containerd: containerRuntimeVersion: containerd://1.6.6 and we have plans to update our EKS cluster to 1.24 then we decide to run this project https://github.com/aws-containers/kubectl-detector-for-docker-socket to validate the dockershim volumes and we only found a volume mounted in the aws-node daemonsets:

Then, we would like to know why aws-node still needs to mount it and if it is correct and safe to update to 1.24.

Thank you!

Environment: production

Kubernetes version (use kubectl version): v1.23.14-eks
CNI Version: v1.12.5-eksbuild.2
OS (e.g: cat /etc/os-release): Amazon Linux - V2
Kernel (e.g. uname -a): 5.4.231-137.341.amzn2.x86_64

jdn5126 commented 1 year ago

Hi @rlinstorres , that is definitely safe to remove. Since the image tag you are using is v1.12.5-eksbuild.2, I assume you were using the managed addon API (aws eks update-addon ...) to update the VPC CNI.

The managed addon service installs the VPC CNI chart using server-side apply, which can lead to fields not previously owned by managed addon service not being removed. If you are curious, there is a long upstream GitHub issue discussing this: https://github.com/kubernetes/kubernetes/issues/99003

We are working on fixing this in managed addon service, but in the meantime you can remove this field and there are no concerns.

rlinstorres commented 1 year ago

Hi @rlinstorres , that is definitely safe to remove. Since the image tag you are using is v1.12.5-eksbuild.2, I assume you were using the managed addon API (aws eks update-addon ...) to update the VPC CNI.

The managed addon service installs the VPC CNI chart using server-side apply, which can lead to fields not previously owned by managed addon service not being removed. If you are curious, there is a long upstream GitHub issue discussing this: kubernetes/kubernetes#99003

We are working on fixing this in managed addon service, but in the meantime you can remove this field and there are no concerns.

Hi @jdn5126, thank you for your answer! Just to let you know, we are using terraform to manage it. Regarding, your suggestion to remove the volume, I did it and everything is working as expected and I was able to update the EKS cluster without issues.

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

Julian-Chu commented 1 year ago

FYI, when using cloudformation to deploly new cluster with eks1.24 + cni 1.12.5 , it mounted dockershim.socker as well, not block for upgrade, but just got a bit confused when running kubectl dds

rpc-dam commented 1 year ago

We are working on fixing this in managed addon service, but in the meantime you can remove this field and there are no concerns.

Is that managed addon service fix being tracked in this project, or some other public project? The issue has not been fixed, I fell victim to it last week upgrading EKS to 1.24 from 1.23 and vpc-cni to 1.12.6 from 1.11.4.

jdn5126 commented 1 year ago

@rpc-dam it is being tracked internally, as the EKS managed addon service is not a public project. By fall victim, did this cause any issue? This should be a no-op, and then the mount can be manually removed (which is unfortunate for now)

rpc-dam commented 1 year ago

@jdn5126 yeah, it broke cluster networking after an eks 1.24 upgrade.

the 1.11 vpc-cni manifest had previously been kubectl applied to upgrade it from 1.10. for the eks 1.24 upgrade, we used the eks addon for vpc-cni to deploy 1.12. once our cluster control plane was 1.24 and workers were all 1.24 (so no dockershim on workers), cluster connectivity was lost. the aws-node daemonset was using 1.12 images, but the manifest had retained the dockershim mount which isn't found in 1.12. Connectivity was only regained after I ran "kubectl apply -f https://raw.githubusercontent.com/aws/amazon-vpc-cni-k8s/v1.12.6/config/master/aws-k8s-cni.yaml which removed of the 1.11 mounts on the aws-node ds pods.

jdn5126 commented 1 year ago

@rpc-dam hmm.. that sounds like a separate issue fixed by applying the correct manifest. When you applied the 1.12 image using EKS addons, did you use --resolve-conflicts OVERWRITE? The dockershim mount being present does not affect cluster connectivity. Are you sure the EKS addons application succeeded?

rpc-dam commented 1 year ago

@jdn5126

the eks vpc-cni addon was deployed using terraform's eks_addon resource with resolve_conflicts = "OVERWRITE".

That resource applied successfully, and resulted in the aws-node DS image tags changing from v1.11.4 to v1.12.6-eksbuild.2, and the aws-node pods restarted as I would have expected, so it looked to have applied OK. In the console the vpc-cni addon which didn't previously exist appeared with status = "active".

The dockershim volume was still in the aws-node manifest even though it doesn't exist in 1.12.6 template or manifest , and my cluster networking was definitely broken, I'm sure I saw failed to create pod sandbox errors on some pods that had started crash looping, like ebs-csi-driver.