alta3 / kubernetes-the-alta3-way

The greatest k8s installer on the planet!
223 stars 34 forks source link

Add gvisor support to node install #35

Closed BicycleWalrus closed 1 year ago

BicycleWalrus commented 2 years ago
BicycleWalrus commented 2 years ago

Installing gvisor for the CKS course is particularly sensitive due to the need to edit the containerd/config.toml file. As such, I feel it is best to have gvisor installed and ready to go from the Alta3 playbook. Installing gvisor is not tested, but setting up a RuntimeClass and a pod which utilizes this class is.

Please find full notes regarding installation, and my progress with it thus far, in the drafted lab I've put together thus far. The procedure mapped out below does not work, and I'm getting errors on attempting to use the new RuntimeClass. All errors and notes, including installation procedure references are in the lab as written for now.

https://github.com/alta3/labs/blob/711eb1aa35d66fbaf20c3f92f8b93d1835f6dc53/content/kubernetes/security/gvisor-lab.md

Follow this link for a pretty version of my notes thus far:

https://live.alta3.com/content/cks/labs/content/kubernetes/security/gvisor-lab.html

bryfry commented 1 year ago

PR branch here https://github.com/alta3/kubernetes-the-alta3-way/pull/41

bryfry commented 1 year ago
student@bchd:~$ cat << EOF > rc.yml
apiVersion: node.k8s.io/v1  # RuntimeClass is defined in the node.k8s.io API group
kind: RuntimeClass
metadata:
  name: myclass  # The name the RuntimeClass will be referenced by
handler: runsc  # non-namespaced, The name of the corresponding CRI configuration
EOF
student@bchd:~$ cat << EOF > rc-pod.yml
apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  runtimeClassName: myclass
  containers:
  - image: busybox
    name: busybox
    command: ['sh', '-c', 'while true; do echo "Running..."; sleep 1h; done']
EOF
student@bchd:~$ kubectl apply -f rc.yml 
runtimeclass.node.k8s.io/myclass created
student@bchd:~$ kubectl apply -f rc-pod.yml 
pod/mypod created
student@bchd:~$ kubectl get pods -o wide
NAME    READY   STATUS    RESTARTS   AGE   IP               NODE     NOMINATED NODE   READINESS GATES
mypod   1/1     Running   0          8s    192.168.84.129   node-1   <none>           <none>
student@bchd:~$ kubectl describe pods | grep Runtime
Runtime Class Name:  myclass
student@bchd:~$ kubectl describe pods | tail -7
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  39s   default-scheduler  Successfully assigned default/mypod to node-1
  Normal  Pulling    38s   kubelet            Pulling image "busybox"
  Normal  Pulled     37s   kubelet            Successfully pulled image "busybox" in 704.187743ms
  Normal  Created    37s   kubelet            Created container busybox
  Normal  Started    36s   kubelet            Started container busybox

Looks like happy lil gVisor pods!

BicycleWalrus commented 1 year ago

When testing the gVisor pods, I noticed they launched without issue, as you described. However, when trying to delete the pods, we're running into a problem.

Images like SimpleService, and BusyBox do not shut down with grace upon receipt of a SIGTERM Signal. Pods created with runsc will never be forcefully shutdown by kubernetes. It still sends the SIGTERM signal, but it doesn't force the pod to shutdown if it doesn't on its own.

Tested with BusyBox (unable to terminate, becuase BusyBox will not turn itself off). Then tested with nginx, which does gracefully shutdown.

This may be an issue with our configuration, but there is a history of this happening with the runsc runtime class.

For now, we'll run a smooth lab by using the nginx image instead. However, this should be investigated, and I'll leave the ticket open to reflect this.

sfeeser commented 1 year ago

@bryfry @BicycleWalrus , it sounds like this issue has been resolved, and now a new issue is appearing, where some containers will not respond to a sigterm. But is this a new problem only discovered with a successful gvisor installation, or an incomplete gvisor installation solution? Either way, the title of this issue is no longer descriptive of the current state. In cases like this, I think we should either:

bryfry commented 1 year ago

@sfeeser @BicycleWalrus

bryfry commented 1 year ago

@BicycleWalrus If you can provide minimum steps to recreate the container termination issue that would help. Thanks!

BicycleWalrus commented 1 year ago

@bryfry -- I've been unable to replicate the issue. Tried several different busybox images, and they all terminated as expected. Not sure why that one pod didn't (perhaps there was just a hiccup on the cluster's side with the forceful termination on that one pod).

So, I'm willing to say it was an isolated issue, and we're good to close this ticket!