confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
47 stars 77 forks source link

Optimising Pod VM size and startup time #343

Open bpradipt opened 1 year ago

bpradipt commented 1 year ago

To make peer-pods approach a viable solution for serverless computing, startup times matter. Even otherwise the startup time should be comparable to using Kata with local hypervisor (qemu/clh).

This is a tracker issue to document findings and identify scope of improvement.

tumberino commented 1 year ago

Do we have a baseline for what "startup time ... using Kata with local hypervisor" is?

bpradipt commented 1 year ago

@jtumber-ibm I'll get some baseline pod startup time data and update this issue.

abhbaner commented 1 year ago

@bpradipt @jtumber-ibm Tested on AWS Openshift cluster

Following startup time was observed For peer pod t3.small instance - 1 minute 41 second

Name:     hello-openshift
Namespace:  default
Priority:   0
Node:     ip-10-0-187-226.us-east-2.compute.internal/10.0.187.226
Start Time:  Wed, 08 Feb 2023 18:11:16 +0530
Labels:    app=hello-openshift
Annotations: [k8s.v1.cni.cncf.io/network-status](http://k8s.v1.cni.cncf.io/network-status):
        [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
            "10.128.2.22"
          ],
          "default": true,
          "dns": {}
        }]
       [k8s.v1.cni.cncf.io/networks-status](http://k8s.v1.cni.cncf.io/networks-status):
        [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
            "10.128.2.22"
          ],
          "default": true,
          "dns": {}
        }]
       [kata.peerpods.io/instance_type](http://kata.peerpods.io/instance_type): t2.small
Status:    Running
IP:      10.128.2.22
IPs:
 IP: 10.128.2.22
Containers:
 hello-openshift:
  Container ID:  cri-o://07062a709864f4e32723adfe1fd9c9d51464d76e9ca938dc286373c1d5f5fb85
  Image:     [quay.io/openshift/origin-hello-openshift](http://quay.io/openshift/origin-hello-openshift)
  Image ID:    [quay.io/openshift/origin-hello-openshift@sha256:369c8324bea625fdb94c1e8cc992ae9d2b0fe780acace7b8c8080349792f3b96](http://quay.io/openshift/origin-hello-openshift@sha256:369c8324bea625fdb94c1e8cc992ae9d2b0fe780acace7b8c8080349792f3b96)
  Port:      8888/TCP
  Host Port:   0/TCP
  State:     Running
   Started:   Wed, 08 Feb 2023 18:12:47 +0530
  Ready:     True
  Restart Count: 0
  Limits:
   [kata.peerpods.io/vm](http://kata.peerpods.io/vm): 1
  Requests:
   [kata.peerpods.io/vm](http://kata.peerpods.io/vm): 1
  Environment:      <none>
  Mounts:
   /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ntdth (ro)
Conditions:
 Type       Status
 Initialized    True 
 Ready       True 
 ContainersReady  True 
 PodScheduled   True 
Volumes:
 kube-api-access-ntdth:
  Type:          Projected (a volume that contains injected data from multiple sources)
  TokenExpirationSeconds: 3607
  ConfigMapName:      kube-root-ca.crt
  ConfigMapOptional:    <nil>
  DownwardAPI:       true
  ConfigMapName:      openshift-service-ca.crt
  ConfigMapOptional:    <nil>
QoS Class:          BestEffort
Node-Selectors:       <none>
Tolerations:         [node.kubernetes.io/not-ready:NoExecute](http://node.kubernetes.io/not-ready:NoExecute) op=Exists for 300s
               [node.kubernetes.io/unreachable:NoExecute](http://node.kubernetes.io/unreachable:NoExecute) op=Exists for 300s
Events:
 Type  Reason     Age  From   Message
 ----  ------     ---- ----   -------
 Normal AddedInterface 100s multus  Add eth0 [10.128.2.22/23] from openshift-sdn
 Normal Pulling     14s  kubelet Pulling image "[quay.io/openshift/origin-hello-openshift](http://quay.io/openshift/origin-hello-openshift)"
 Normal Pulled     14s  kubelet Successfully pulled image "[quay.io/openshift/origin-hello-openshift](http://quay.io/openshift/origin-hello-openshift)" in 609.587941ms
 Normal Created     11s  kubelet Created container hello-openshift
 Normal Started     11s  kubelet Started container hello-openshift

For peer pod c5.xlarge instance - 2 minute 45 second

Name:     hello-openshift
Namespace:  default
Priority:   0
Node:     ip-10-0-187-226.us-east-2.compute.internal/10.0.187.226
Start Time:  Wed, 08 Feb 2023 18:00:31 +0530
Labels:    app=hello-openshift
Annotations: [k8s.v1.cni.cncf.io/network-status](http://k8s.v1.cni.cncf.io/network-status):
        [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
            "10.128.2.21"
          ],
          "default": true,
          "dns": {}
        }]
       [k8s.v1.cni.cncf.io/networks-status](http://k8s.v1.cni.cncf.io/networks-status):
        [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
            "10.128.2.21"
          ],
          "default": true,
          "dns": {}
        }]
       [kata.peerpods.io/instance_type](http://kata.peerpods.io/instance_type): t2.small
Status:    Running
IP:      10.128.2.21
IPs:
 IP: 10.128.2.21
Containers:
 hello-openshift:
  Container ID:  cri-o://cbd66cd5255be1d61a0fa4440b8480ffe8a9b89b71dc288a2fef561f81980b90
  Image:     [quay.io/openshift/origin-hello-openshift](http://quay.io/openshift/origin-hello-openshift)
  Image ID:    [quay.io/openshift/origin-hello-openshift@sha256:369c8324bea625fdb94c1e8cc992ae9d2b0fe780acace7b8c8080349792f3b96](http://quay.io/openshift/origin-hello-openshift@sha256:369c8324bea625fdb94c1e8cc992ae9d2b0fe780acace7b8c8080349792f3b96)
  Port:      8888/TCP
  Host Port:   0/TCP
  State:     Running
   Started:   Wed, 08 Feb 2023 18:01:56 +0530
  Ready:     True
  Restart Count: 0
  Limits:
   [kata.peerpods.io/vm](http://kata.peerpods.io/vm): 1
  Requests:
   [kata.peerpods.io/vm](http://kata.peerpods.io/vm): 1
  Environment:      <none>
  Mounts:
   /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dwdsx (ro)
Conditions:
 Type       Status
 Initialized    True 
 Ready       True 
 ContainersReady  True 
 PodScheduled   True 
Volumes:
 kube-api-access-dwdsx:
  Type:          Projected (a volume that contains injected data from multiple sources)
  TokenExpirationSeconds: 3607
  ConfigMapName:      kube-root-ca.crt
  ConfigMapOptional:    <nil>
  DownwardAPI:       true
  ConfigMapName:      openshift-service-ca.crt
  ConfigMapOptional:    <nil>
QoS Class:          BestEffort
Node-Selectors:       <none>
Tolerations:         [node.kubernetes.io/not-ready:NoExecute](http://node.kubernetes.io/not-ready:NoExecute) op=Exists for 300s
               [node.kubernetes.io/unreachable:NoExecute](http://node.kubernetes.io/unreachable:NoExecute) op=Exists for 300s
Events:
 Type  Reason     Age  From   Message
 ----  ------     ----  ----   -------
 Normal AddedInterface 2m10s multus  Add eth0 [10.128.2.21/23] from openshift-sdn
 Normal Pulling     50s  kubelet Pulling image "[quay.io/openshift/origin-hello-openshift](http://quay.io/openshift/origin-hello-openshift)"
 Normal Pulled     49s  kubelet Successfully pulled image "[quay.io/openshift/origin-hello-openshift](http://quay.io/openshift/origin-hello-openshift)" in 549.836917ms
 Normal Created     47s  kubelet Created container hello-openshift
 Normal Started     47s  kubelet Started container hello-openshift

For runc pod (runtime default) - 7 seconds

Name:     nginx
Namespace:  default
Priority:   0
Node:     ip-10-0-219-218.us-east-2.compute.internal/10.0.219.218
Start Time:  Wed, 08 Feb 2023 17:47:28 +0530
Labels:    run=nginx
Annotations: [k8s.v1.cni.cncf.io/network-status](http://k8s.v1.cni.cncf.io/network-status):
        [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
            "10.131.0.18"
          ],
          "default": true,
          "dns": {}
        }]
       [k8s.v1.cni.cncf.io/networks-status](http://k8s.v1.cni.cncf.io/networks-status):
        [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
            "10.131.0.18"
          ],
          "default": true,
          "dns": {}
        }]
Status:    Running
IP:      10.131.0.18
IPs:
 IP: 10.131.0.18
Containers:
 nginx:
  Container ID: cri-o://17418aed26b8c36b4aca40f3724c620ee4022440109593ef0dcf2449939f9994
  Image:     [quay.io/bpradipt/http-echo:nonroot](http://quay.io/bpradipt/http-echo:nonroot)
  Image ID:   [quay.io/bpradipt/http-echo@sha256:ec3c0642ff30050ea3f462c8f662ffa53036f6a3e089085714c7ff13501002dc](http://quay.io/bpradipt/http-echo@sha256:ec3c0642ff30050ea3f462c8f662ffa53036f6a3e089085714c7ff13501002dc)
  Port:     <none>
  Host Port:   <none>
  Args:
   -text
   Hello from Kata container
   -listen
   :8080
  State:     Running
   Started:   Wed, 08 Feb 2023 17:47:33 +0530
  Ready:     True
  Restart Count: 0
  Environment:  <none>
  Mounts:
   /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-77wkg (ro)
Conditions:
 Type       Status
 Initialized    True 
 Ready       True 
 ContainersReady  True 
 PodScheduled   True 
Volumes:
 kube-api-access-77wkg:
  Type:          Projected (a volume that contains injected data from multiple sources)
  TokenExpirationSeconds: 3607
  ConfigMapName:      kube-root-ca.crt
  ConfigMapOptional:    <nil>
  DownwardAPI:       true
  ConfigMapName:      openshift-service-ca.crt
  ConfigMapOptional:    <nil>
QoS Class:          BestEffort
Node-Selectors:       <none>
Tolerations:         [node.kubernetes.io/not-ready:NoExecute](http://node.kubernetes.io/not-ready:NoExecute) op=Exists for 300s
               [node.kubernetes.io/unreachable:NoExecute](http://node.kubernetes.io/unreachable:NoExecute) op=Exists for 300s
Events:
 Type  Reason     Age  From        Message
 ----  ------     ---- ----        -------
 Normal Scheduled    12s  default-scheduler Successfully assigned default/nginx to ip-10-0-219-218.us-east-2.compute.internal
 Normal AddedInterface 10s  multus       Add eth0 [10.131.0.18/23] from openshift-sdn
 Normal Pulling     9s  kubelet      Pulling image "[quay.io/bpradipt/http-echo:nonroot](http://quay.io/bpradipt/http-echo:nonroot)"
 Normal Pulled     7s  kubelet      Successfully pulled image "[quay.io/bpradipt/http-echo:nonroot](http://quay.io/bpradipt/http-echo:nonroot)" in 2.274906876s
 Normal Created     7s  kubelet      Created container nginx
 Normal Started     7s  kubelet      Started container nginx
bpradipt commented 1 year ago

Thanks a lot @abhbaner. Can you please confirm the pod VM image distro - CentOS, RHEL or Ubuntu ?

abhbaner commented 1 year ago

The podVM image distro is CentOs.

bpradipt commented 1 year ago

Putting the timing data for easier understanding

runc

 Normal Scheduled    12s  default-scheduler Successfully assigned default/nginx to ip-10-0-219-218.us-east-2.compute.internal
 Normal AddedInterface 10s  multus       Add eth0 [10.131.0.18/23] from openshift-sdn
 Normal Pulling     9s  kubelet      Pulling image "[quay.io/bpradipt/http-echo:nonroot](http://quay.io/bpradipt/http-echo:nonroot)"
 Normal Pulled     7s  kubelet      Successfully pulled image "[quay.io/bpradipt/http-echo:nonroot](http://quay.io/bpradipt/http-echo:nonroot)" in 2.274906876s
 Normal Created     7s  kubelet      Created container nginx
 Normal Started     7s  kubelet      Started container nginx

peer-pod (c5.xlarge)

Normal AddedInterface 2m10s multus  Add eth0 [10.128.2.21/23] from openshift-sdn
 Normal Pulling     50s  kubelet Pulling image "[quay.io/openshift/origin-hello-openshift](http://quay.io/openshift/origin-hello-openshift)"
 Normal Pulled     49s  kubelet Successfully pulled image "[quay.io/openshift/origin-hello-openshift](http://quay.io/openshift/origin-hello-openshift)" in 549.836917ms
 Normal Created     47s  kubelet Created container hello-openshift
 Normal Started     47s  kubelet Started container hello-openshift

peer-pod(t3.small)

Normal AddedInterface 100s multus  Add eth0 [10.128.2.22/23] from openshift-sdn
 Normal Pulling     14s  kubelet Pulling image "[quay.io/openshift/origin-hello-openshift](http://quay.io/openshift/origin-hello-openshift)"
 Normal Pulled     14s  kubelet Successfully pulled image "[quay.io/openshift/origin-hello-openshift](http://quay.io/openshift/origin-hello-openshift)" in 609.587941ms
 Normal Created     11s  kubelet Created container hello-openshift
 Normal Started     11s  kubelet Started container hello-openshift
bpradipt commented 1 year ago

A rough sequence diagram of the Pod VM startup process

image

bpradipt commented 1 year ago

@jtumber-ibm @stevenhorsman @surajssd @yoheiueda @huoqifeng

yoheiueda commented 1 year ago

I did the same experiments on my cluster in IBM Cloud VPC.

runc

  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  4s    default-scheduler  Successfully assigned default/nginx to k8s-worker
  Normal  Pulling    4s    kubelet            Pulling image "nginx"
  Normal  Pulled     3s    kubelet            Successfully pulled image "nginx" in 1.173574561s
  Normal  Created    3s    kubelet            Created container nginx
  Normal  Started    3s    kubelet            Started container nginx

peer pod

  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  44s   default-scheduler  Successfully assigned default/nginx to k8s-worker
  Normal  Pulling    7s    kubelet            Pulling image "nginx"
  Normal  Pulled     3s    kubelet            Successfully pulled image "nginx" in 4.218585385s
  Normal  Created    3s    kubelet            Created container nginx
  Normal  Started    1s    kubelet            Started container nginx

I calculated an elapsed time in each state transition as follows. I think the container image is already cached in the case of runc.

State Transition runc peer-pod
Scheduled→Pulling 0s 37s
Pulling→Pulled 1s 4s
Pulled→Created 0s 0s
Created→Started 0s 2s
mkulke commented 1 year ago

I suspect getting close to a baseline with local kata on kvm/qemu will be tough. I don't have concrete measurements at the moment, but time-to-pod on local kata was a matter of a few seconds (~3-10s), even for an SEV-SNP enabled guest, which has a bit of an init overhead.

Below is a measurement from a CAA test, it's not broken down by the individual CAA steps, but you can kind of extrapolate from the k8s logs:

kubectl get events -o custom-columns=Timestamp:.firstTimestamp,Message:.message --field-selector involvedObject.name=nginx-7df44f78b6-tgxnx                                                    
Timestamp              Message                                                     
2023-01-27T08:54:06Z   Successfully assigned default/nginx-7df44f78b6-tgxnx to k8s-ci                                                                             
2023-01-27T08:54:51Z   Container image "nginx:1.14.2" already present on machine   
2023-01-27T08:54:51Z   Created container nginx                                     
2023-01-27T08:54:53Z   Started container nginx                                     
2023-01-27T08:54:53Z   Container image "*" already present on machine                                                      
2023-01-27T08:54:53Z   Created container sidecar                            
2023-01-27T08:54:55Z   Started container sidecar
bpradipt commented 1 year ago

Some additional data purely from the pod VM image standpoint. I used both CentOS Stream 8 and Ubuntu 20.04 cloud images for relative comparison.

CentOS Stream8 Cloud image

$ oc get pods
NAME                         READY   STATUS    RESTARTS   AGE
echo-kata-67ccb48d7b-wpwnq   1/1     Running   0          69s

[root@podvm-echo-kata-67ccb48d7b-wpwnq-6e216995 peerpod]# systemd-analyze time
Startup finished in 1.266s (kernel) + 3.378s (initrd) + 47.053s (userspace) = 51.698s
multi-user.target reached after 29.136s in userspace

[root@podvm-echo-kata-67ccb48d7b-wpwnq-6e216995 peerpod]# systemd-analyze blame
         24.677s cloud-init.service
         18.057s kdump.service
          3.278s NetworkManager-wait-online.service
          2.283s cloud-init-local.service
          1.062s initrd-switch-root.service
           773ms dracut-initqueue.service
[snip]

Disabling kdump.service from the pod VM image gave the following results:

[kcli@virtlab714 peerpods-files]$ oc get pods
NAME                         READY   STATUS    RESTARTS   AGE
echo-kata-67ccb48d7b-gt9nv   1/1     Running   0          59s

[peerpod@podvm-echo-kata-67ccb48d7b-gt9nv-4b58c292 peerpod]# systemd-analyze time
Startup finished in 1.237s (kernel) + 3.378s (initrd) + 30.510s (userspace) = 35.126s
multi-user.target reached after 29.231s in userspace
[root@podvm-echo-kata-67ccb48d7b-gt9nv-4b58c292 peerpod]# systemd-analyze blame
         24.898s cloud-init.service
          3.361s NetworkManager-wait-online.service
          2.264s cloud-init-local.service
          1.001s initrd-switch-root.service
           776ms dracut-initqueue.service

Ubuntu 20.04 cloud image

$ oc get pods
NAME                         READY   STATUS    RESTARTS   AGE
echo-kata-67ccb48d7b-r4bcb   1/1     Running   0          70s

root@podvm-echo-kata-67ccb48d7b-r4bcb-15a38e4a:/home/peerpod# systemd-analyze time
Startup finished in 4.150s (kernel) + 14.571s (userspace) = 18.721s
graphical.target reached after 13.728s in userspace
root@podvm-echo-kata-67ccb48d7b-r4bcb-15a38e4a:/home/peerpod# systemd-analyze blame
41.390s apt-daily.service
 3.082s snapd.service
 3.029s systemd-networkd-wait-online.service
 2.803s snap.lxd.activate.service
 2.624s cloud-init-local.service
 1.549s cloud-init.service
 1.414s cloud-config.service
 1.383s apt-daily-upgrade.service
 1.234s dev-sda1.device
 1.222s ua-timer.service

Disabling apt-daily.service gave the following results:

$ oc get pods
NAME                         READY   STATUS    RESTARTS   AGE
echo-kata-67ccb48d7b-q4q8r   1/1     Running   0          39s

root@podvm-echo-kata-67ccb48d7b-q4q8r-faf29ce7:/home/peerpod# systemd-analyze time
Startup finished in 4.162s (kernel) + 14.206s (userspace) = 18.369s
graphical.target reached after 13.457s in userspace
root@podvm-echo-kata-67ccb48d7b-q4q8r-faf29ce7:/home/peerpod# systemd-analyze blame
3.025s systemd-networkd-wait-online.service
2.683s snap.lxd.activate.service
2.363s cloud-init-local.service
2.269s snapd.service
1.848s cloud-init.service
bpradipt commented 1 year ago

@yoheiueda @mkulke