Open tjwatson opened 1 year ago
I used the default selections to create a cluster. Only selected Kubernetes version v1.24 to ensure containerd
container engine.
Node information:
$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-agentpool-11140021-vmss000000 Ready agent 10m v1.24.6 10.224.0.4 <none> Ubuntu 18.04.6 LTS 5.4.0-1094-azure containerd://1.6.4+azure-4
Deployed deployment.yaml and noticed the following restore failure:
$ kubectl logs open-liberty-instanton-64cbb855db-j4xp8
CRIU needs to have the CAP_SYS_ADMIN or the CAP_CHECKPOINT_RESTORE capability:
setcap cap_checkpoint_restore+eip criu
(00.000000) Effective capability 40 missing
(00.000000) Effective capability 21 missing
CWWKE0957I: Restoring the checkpoint server process failed. Check the /logs/checkpoint/restore.log log to determine why the checkpoint process was not restored. Launching the server without using the checkpoint image.
Status: The nodes are running an older kernel version so unprivileged restore does not work. Kernel 5.9 or higher is needed. Privileged restore does work.
Status: The documentation for configuring the containers to deploy does not mention passing any additional Linux capabilities. So, there is no way (that I can find) to make unprivileged restore to work. Also, privileged containers are not supported by ACA.
The restore operation failed with the following in ACA (as expected):
2022-12-07T05:17:48.595178852Z /opt/ol/wlp/bin/server: line 1407: /usr/sbin/criu: Operation not permitted
2022-12-07T05:17:50.185483507Z CWWKE0957I: Restoring the checkpoint server process failed. Check the /logs/checkpoint/restore.log log to determine why the checkpoint process was not restored. Launching the server without using the checkpoint image.
Do we feel that ACA will add the ability to pass in Linux capabilities ? Otherwise it seems even an OS upgrade won't be enough in that environment. Maybe you have already asked them that question.
@vijaysun-omr We are trying to engage with the ACA folks to add that ability.
There are two services we are targeting: