Azure / azure-container-networking

Azure Container Networking Solutions for Linux and Windows Containers
MIT License
360 stars 229 forks source link

Race when creating azure-vnet lock directory with permissions #2818

Open QxBytes opened 2 weeks ago

QxBytes commented 2 weeks ago

What happened: On the first boot, no CNI binary is on the node, and so k8s creates the /var/run/azure-vnet directory with 0755 permissions automatically because it is a mount part of the azure-cns daemonset. Then the CNI is deployed. The /var/run directory is not preserved between reboots. Then, when the VM reboots, the CNI binary may run before k8s creates the /var/run/azure-vnet directory. When the CNI binary runs first, it creates the directory with 0644 permissions. This causes permission denied errors for the cns. Even if k8s creates/mounts the /var/run/azure-vnet directory later, it will see it already exists and won't recreate the directory with the 0755 permissions.

What you expected to happen:
The CNI binary should create the directory with 0755 permissions.

How to reproduce it:
Reboot the VM with the cns capabilities security context dropping all capabilities (so it doesn't bypass permission checks). There is a chance that the azure-cns pod will get stuck in crash loop backoff.

Orchestrator and Version (e.g. Kubernetes, Docker):

Operating System (Linux/Windows):

Kernel (e.g. uanme -a for Linux or $(Get-ItemProperty -Path "C:\windows\system32\hal.dll").VersionInfo.FileVersion for Windows):

Anything else we need to know?: [Miscellaneous information that will assist in solving the issue.]

rbtr commented 2 weeks ago
rbtr commented 2 weeks ago

https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods sounds like it would let the k8s chmod the directory when the CNS Pod mounts it as a Volume?