Open QxBytes opened 2 weeks ago
securityContext:
capabilities:
drop:
- ALL
add:
- NET_ADMIN # only necessary for delegated IPAM/Cilium
- NET_RAW # only necessary for delegated IPAM/Cilium
and make sure that we have a test that's rebooting the Node and verifying CNS functionality afterwards
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods sounds like it would let the k8s chmod the directory when the CNS Pod mounts it as a Volume?
What happened: On the first boot, no CNI binary is on the node, and so k8s creates the /var/run/azure-vnet directory with 0755 permissions automatically because it is a mount part of the azure-cns daemonset. Then the CNI is deployed. The /var/run directory is not preserved between reboots. Then, when the VM reboots, the CNI binary may run before k8s creates the /var/run/azure-vnet directory. When the CNI binary runs first, it creates the directory with 0644 permissions. This causes permission denied errors for the cns. Even if k8s creates/mounts the /var/run/azure-vnet directory later, it will see it already exists and won't recreate the directory with the 0755 permissions.
What you expected to happen:
The CNI binary should create the directory with 0755 permissions.
How to reproduce it:
Reboot the VM with the cns capabilities security context dropping all capabilities (so it doesn't bypass permission checks). There is a chance that the azure-cns pod will get stuck in crash loop backoff.
Orchestrator and Version (e.g. Kubernetes, Docker):
Operating System (Linux/Windows):
Kernel (e.g.
uanme -a
for Linux or$(Get-ItemProperty -Path "C:\windows\system32\hal.dll").VersionInfo.FileVersion
for Windows):Anything else we need to know?: [Miscellaneous information that will assist in solving the issue.]