Closed BicycleWalrus closed 1 year ago
Installing gvisor for the CKS course is particularly sensitive due to the need to edit the containerd/config.toml file. As such, I feel it is best to have gvisor installed and ready to go from the Alta3 playbook. Installing gvisor is not tested, but setting up a RuntimeClass and a pod which utilizes this class is.
Please find full notes regarding installation, and my progress with it thus far, in the drafted lab I've put together thus far. The procedure mapped out below does not work, and I'm getting errors on attempting to use the new RuntimeClass. All errors and notes, including installation procedure references are in the lab as written for now.
Follow this link for a pretty version of my notes thus far:
https://live.alta3.com/content/cks/labs/content/kubernetes/security/gvisor-lab.html
PR branch here https://github.com/alta3/kubernetes-the-alta3-way/pull/41
student@bchd:~$ cat << EOF > rc.yml
apiVersion: node.k8s.io/v1 # RuntimeClass is defined in the node.k8s.io API group
kind: RuntimeClass
metadata:
name: myclass # The name the RuntimeClass will be referenced by
handler: runsc # non-namespaced, The name of the corresponding CRI configuration
EOF
student@bchd:~$ cat << EOF > rc-pod.yml
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
runtimeClassName: myclass
containers:
- image: busybox
name: busybox
command: ['sh', '-c', 'while true; do echo "Running..."; sleep 1h; done']
EOF
student@bchd:~$ kubectl apply -f rc.yml
runtimeclass.node.k8s.io/myclass created
student@bchd:~$ kubectl apply -f rc-pod.yml
pod/mypod created
student@bchd:~$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mypod 1/1 Running 0 8s 192.168.84.129 node-1 <none> <none>
student@bchd:~$ kubectl describe pods | grep Runtime
Runtime Class Name: myclass
student@bchd:~$ kubectl describe pods | tail -7
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 39s default-scheduler Successfully assigned default/mypod to node-1
Normal Pulling 38s kubelet Pulling image "busybox"
Normal Pulled 37s kubelet Successfully pulled image "busybox" in 704.187743ms
Normal Created 37s kubelet Created container busybox
Normal Started 36s kubelet Started container busybox
Looks like happy lil gVisor pods!
When testing the gVisor pods, I noticed they launched without issue, as you described. However, when trying to delete the pods, we're running into a problem.
Images like SimpleService, and BusyBox do not shut down with grace upon receipt of a SIGTERM Signal. Pods created with runsc will never be forcefully shutdown by kubernetes. It still sends the SIGTERM signal, but it doesn't force the pod to shutdown if it doesn't on its own.
Tested with BusyBox (unable to terminate, becuase BusyBox will not turn itself off). Then tested with nginx, which does gracefully shutdown.
This may be an issue with our configuration, but there is a history of this happening with the runsc runtime class.
For now, we'll run a smooth lab by using the nginx image instead. However, this should be investigated, and I'll leave the ticket open to reflect this.
@bryfry @BicycleWalrus , it sounds like this issue has been resolved, and now a new issue is appearing, where some containers will not respond to a sigterm. But is this a new problem only discovered with a successful gvisor installation, or an incomplete gvisor installation solution? Either way, the title of this issue is no longer descriptive of the current state. In cases like this, I think we should either:
@sfeeser @BicycleWalrus
@BicycleWalrus If you can provide minimum steps to recreate the container termination issue that would help. Thanks!
@bryfry -- I've been unable to replicate the issue. Tried several different busybox images, and they all terminated as expected. Not sure why that one pod didn't (perhaps there was just a hiccup on the cluster's side with the forceful termination on that one pod).
So, I'm willing to say it was an isolated issue, and we're good to close this ticket!