Open idmurphy opened 1 year ago
I resolved by removing line 48 (i.e. the kubectl before the reboot) and adding the mentioned podPriorityClass into the yaml file and used this modified version. We haven't seen the issue since.
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Describe the bug The revert-cgroups deployment and container to revert to cgroups v1 intermittently does not reboot the node
We have installed the daemonset released from AKS team for reverting cgroups to v1 here: https://github.com/Azure/AKS/blob/master/examples/cgroups/revert-cgroup-v1.yaml
However, we have seen a few occasions where one or more of the nodes in the AKS cluster didn't get rebooted, although the cgroup-version label was added to the node. This is highly unpredictable given nodes can get scaled up and down, and therefore leave application pods in non-working states.
To Reproduce Steps to reproduce the behavior:
Expected behavior The cgroup-version label on the node should only be applied once it is known that cgruops-v1 is active, and reboot must always therefore occur.
Screenshots If applicable, add screenshots to help explain your problem.
last reboot at 06:38 and still in cgroupv2
grub file updated at 06:40 by the revert-cgroup
Environment (please complete the following information): -AKS 1..26
Additional context When we connected to the node to check the grub file, we found it was set with the cgroups v1 as per the above revert-cgroups. When we check the last reboot time of the node, and the time of the update to the grub file, we see the last reboot was before the grub file got updated, therefore we can conclude the 'reboot' line in the rever-cgroups didn't run.
What we believe is happening, is that since the labelling of the node using kubectl occurs before the reboot line (i.e. on line 48 of the above yaml), the pod can get de-schduled from the node before the reboot line, hence there is a race condition going on.
We have tested by removing line 48 and this works as expected - and this also means the node does not get labelled until after the reboot and the second time it will re-check, if cgroupsv1 is active it will therefore enter the else and label the node per line 51.
In addition, the revert-cgroups doesn't have any pod priority class assigned, this means there is a risk also that k8s schedules other pods to the node before this one. Therefore, would request that the pod priority class is added.
Please confirm if