confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
44 stars 71 forks source link

Improving startup time #1315

Open mkulke opened 10 months ago

mkulke commented 10 months ago

This is a boot chart from systemd-analyze on ubuntu 22.04 in azure. It shows that we spend ~15s on initialization tasks, before the images are being pulled.

A relatively low hanging fruit would probably be to remove the dependency on cloud-init. We provision files via the cloud-config protocol to the podvm's at the moment. We could replace that with some bespoke logic in the unit files directly and agent-protocol-forwarder would be able to start earlier.

systemd

bpradipt commented 10 months ago

Something like a systemd unit reading the user-data over network and putting the info in the right place ?

mkulke commented 10 months ago

Something like a systemd unit reading the user-data over network and putting the info in the right place ?

yup

bpradipt commented 10 months ago

I'll give it a try.

bpradipt commented 10 months ago

With userdata, it takes approx 1 sec from VM creation to agent proxy being ready.

2023/08/24 14:39:54 [adaptor/cloud/azure] created VM successfully: /subscriptions/89a9ba4b-8b66-446a-813d-a9ed3a129e3d/resourceGroups/aks-rg/providers/Microsoft.Compute/virtualMachines/podvm-test-d0453224
2023/08/24 14:39:54 [adaptor/cloud/azure] podNodeIP[0]=10.224.0.5
2023/08/24 14:39:54 [adaptor/cloud] failed to create PeerPod: the server could not find the requested resource (post peerPods.confidentialcontainers.org)
2023/08/24 14:39:54 [adaptor/cloud] created an instance podvm-test-d0453224 for sandbox d04532242afc04b033c13594a93acd1808a2e4f62bdbdd446fe86dbba54bc161
2023/08/24 14:39:54 [tunneler/vxlan] vxlan ppvxlan1 (remote 10.224.0.5:8472, id: 555009) created at /proc/1/task/12/ns/net
2023/08/24 14:39:54 [tunneler/vxlan] vxlan ppvxlan1 created at /proc/1/task/12/ns/net
2023/08/24 14:39:54 [tunneler/vxlan] vxlan ppvxlan1 is moved to /var/run/netns/cni-60e4bf8f-fb9c-0bc7-d3f1-73a325c0608b
2023/08/24 14:39:54 [tunneler/vxlan] Add tc redirect filters between eth0 and vxlan1 on pod network namespace /var/run/netns/cni-60e4bf8f-fb9c-0bc7-d3f1-73a325c0608b
2023/08/24 14:39:54 [adaptor/proxy] Listening on /run/peerpod/pods/d04532242afc04b033c13594a93acd1808a2e4f62bdbdd446fe86dbba54bc161/agent.ttrpc
2023/08/24 14:39:54 [adaptor/proxy] failed to init cri client, the err: cri runtime endpoint is not specified, it is used to get the image name from image digest
2023/08/24 14:39:54 [adaptor/proxy] Trying to establish agent proxy connection to 10.224.0.5:15150
2023/08/24 14:39:55 [adaptor/proxy] established agent proxy connection to 10.224.0.5:15150
2023/08/24 14:39:55 [adaptor/cloud] agent proxy is ready

Without userdata (existing mechanism), it takes approx 7 sec from VM creation to agent proxy being ready.

2023/08/24 14:56:46 [adaptor/cloud/azure] created VM successfully: /subscriptions/89a9ba4b-8b66-446a-813d-a9ed3a129e3d/resourceGroups/aks-rg/providers/Microsoft.Compute/virtualMachines/podvm-test-28aadd09
2023/08/24 14:56:46 [adaptor/cloud/azure] podNodeIP[0]=10.224.0.5
2023/08/24 14:56:46 [adaptor/cloud] failed to create PeerPod: the server could not find the requested resource (post peerPods.confidentialcontainers.org)
2023/08/24 14:56:46 [adaptor/cloud] created an instance podvm-test-28aadd09 for sandbox 28aadd091c517d840dbcea1491dedf637a33d29306026a8a1cf01b021307b65f
2023/08/24 14:56:46 [tunneler/vxlan] vxlan ppvxlan1 (remote 10.224.0.5:8472, id: 555000) created at /proc/1/task/14/ns/net
2023/08/24 14:56:46 [tunneler/vxlan] vxlan ppvxlan1 created at /proc/1/task/14/ns/net
2023/08/24 14:56:46 [tunneler/vxlan] vxlan ppvxlan1 is moved to /var/run/netns/cni-f62cb9c2-7fba-4dd3-1d11-eb7a5871b093
2023/08/24 14:56:46 [tunneler/vxlan] Add tc redirect filters between eth0 and vxlan1 on pod network namespace /var/run/netns/cni-f62cb9c2-7fba-4dd3-1d11-eb7a5871b093
2023/08/24 14:56:46 [adaptor/proxy] Listening on /run/peerpod/pods/28aadd091c517d840dbcea1491dedf637a33d29306026a8a1cf01b021307b65f/agent.ttrpc
2023/08/24 14:56:46 [adaptor/proxy] failed to init cri client, the err: cri runtime endpoint is not specified, it is used to get the image name from image digest
2023/08/24 14:56:46 [adaptor/proxy] Trying to establish agent proxy connection to 10.224.0.5:15150
2023/08/24 14:56:53 [adaptor/proxy] established agent proxy connection to 10.224.0.5:15150
2023/08/24 14:56:53 [adaptor/cloud] agent proxy is ready
bpradipt commented 10 months ago

@mkulke the above data is only for one iteration. I didn't try with different instance types. I'll raise a PR (most likely) tomorrow after some cleanups.

mkulke commented 9 months ago

Some numbers that might help to identify further opportunities for optimization:

VM launched (+0s):

2023/09/07 18:45:50 [adaptor/cloud/azure] CreateInstance: name: "podvm-busybox-caa-5f99bd66c7-tcd92-ed00a45a"

/stat being created on podvm (+32s):

stat -c %z /proc/
2023-09-07 18:46:22.609353200 +0000

waagent starts on podvm (+39s):

head -1 /var/log/waagent.log
2023-09-07T18:46:29.390118Z INFO Daemon Azure Linux Agent Version:2.2.46

VM reported as ready (+46s):

2023/09/07 18:46:36 [adaptor/cloud/azure] created VM successfully: /subscriptions/REDACTED/resourceGroups/mgns/providers/Microsoft.Compute/virtualMachines/podvm-busybox-caa-5f99bd66c7-tcd92-ed00a45a

Sandbox is created (+47s):

2023/09/07 18:46:37 [adaptor/proxy] CreateSandbox: hostname:busybox-caa-5f99bd66c7-tcd92 sandboxId:ed00a45a13d810c283cc9ab6f9e629644705782382837a60c5914fc9db05ec11

So atm we get into container business after ~45s, the first signs of life from the VM are at ~30s. I suspect that if we report the VM as ok earlier (replacing waagent and setting goal-state manually) we might get closer to the 30s mark.

mkulke commented 9 months ago

There is some improvement, but we can possibly do more, hence re-opening

mkulke commented 8 months ago

At least on Azure there is a problem that the Create-VM API call will only return once the VM has registered itself as ready. We might want to reconsider this approach.

Today an Azure SNP CVM will be provisioned in about 30s until entering the kernel. This is something that we have to account for unless we look at optimizations beyond the PodVM Image.

An average boot process to multi-user.target we can consider to take around 15s. In the PodVM image, a VM can report itself as ready only relatively late in the boot process, since it needs networking to be up.

So for kata-agent to start working we will need to wait 45s really.

A possible optimization in CAA could be to not wait for the Create-VM call to return, i.e. we want to only know whether the VM creation has been successfully triggered, we're not interested necessarily in the "Readyness" of the VM. We will know this from a healthy POD or otherwise. So we could start facilitating connections even if the VM is not ready.

We need to measure things however before implementing. I suspect that user-data is populated asynchronous, and we'll have to wait for user-data to be available anyway which exceeds 45s anyway.