confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
44 stars 71 forks source link

aa-kbc-params is not customized in agent-config.toml for libvirt provider in fedora #1852

Closed huoqifeng closed 5 days ago

huoqifeng commented 3 weeks ago

When creating libvirt PeerPod based on the fedora image on a s390x machine, which is built from mkosi. The field "aa-kbc-params" in agent-config.toml under /run/peerpod was not customized correctly.

Which should be caused by the process-user-data. Logs looks like:

#journalctl -t process-user-data
Jun 06 00:11:23 podvm-busybox-b4e79cb5 process-user-data[676]: failed to send request: Get "http://169.254.169.254/metadata/instance/compute?api-version=2021-01-01": dial tcp 169.254.169.254:80: connect: network
is unreachable
Jun 06 00:11:23 podvm-busybox-b4e79cb5 process-user-data[676]: failed to send request: Get "http://169.254.169.254/latest/dynamic/instance-identity/document": dial tcp 169.254.169.254:80: connect: network is unre
achable
Jun 06 00:11:23 podvm-busybox-b4e79cb5 process-user-data[676]: Error: failed to create UserData provider: unsupported user data provider

After disable the "ExecStartPre" https://github.com/confidential-containers/cloud-api-adaptor/blob/main/src/cloud-api-adaptor/podvm/files/etc/systemd/system/process-user-data.service#L12 Error like this:

# journalctl -t process-user-data
Jun 06 03:03:14 podvm-busybox-66cded21 process-user-data[675]: 2024/06/06 03:03:14 [agent/update] failed to read daemon config file: open /run/peerpod/daemon.json: no such file or directory
Jun 06 03:03:14 podvm-busybox-66cded21 process-user-data[675]: Error: failed to get daemon config from local file

The problem is on fedora, the failure in ExecStartPre in process-user-data causes ExecStart skipped because libvirt provider does not implement the provision API.

Option 1

I tried and broken it into 2 services. process-user-data-provision and process-user-data-update while process-user-data-update depends on cloud-final.service because libvirt and other providers like ibmcloud uses cloud-init to provision user-data. It works for libvirt provider on ubuntu because:

  1. ExecStartPre in process-user-data won't cause ExecStart skip on ubuntu
  2. Which is /etc/agent-config.toml rather than /run/peerpod/ agent-config.toml on ubuntu
    1. Fedora https://github.com/confidential-containers/cloud-api-adaptor/blob/main/src/cloud-api-adaptor/podvm-mkosi/mkosi.skeleton/usr/lib/systemd/system/kata-agent.service.d/10-override.conf
    2. Ubuntu https://github.com/confidential-containers/cloud-api-adaptor/blob/main/src/cloud-api-adaptor/podvm/files/etc/systemd/system/kata-agent.service#L10

Option 2:

We can handle agent-config.toml just like cdh.toml here https://github.com/confidential-containers/cloud-api-adaptor/blob/main/src/cloud-api-adaptor/pkg/adaptor/cloud/cloud.go#L258-L282 via cloudConfig rather than update it in process-user-data

stevenhorsman commented 3 weeks ago

Hey @huoqifeng - for the cdh configuration we also need aa_kbc_params set in /run/confidential-containers/cdh.toml (see https://github.com/confidential-containers/cloud-api-adaptor/pull/1748), do you know if that is working okay for libvirt in fedora too?

huoqifeng commented 3 weeks ago

Hey @huoqifeng - for the cdh configuration we also need aa_kbc_params set in /run/confidential-containers/cdh.toml (see #1748), do you know if that is working okay for libvirt in fedora too?

CDH configure is OK on Fedora.

huoqifeng commented 3 weeks ago

Right, maybe we should handle agent-config.toml similar as cdh.toml and remove the algorithm for its update in process-user-data.

huoqifeng commented 3 weeks ago

@stevenhorsman @mkulke @bpradipt @liudalibj We can handle agent-config.toml just like cdh.toml here https://github.com/confidential-containers/cloud-api-adaptor/blob/main/src/cloud-api-adaptor/pkg/adaptor/cloud/cloud.go#L258-L282 via cloudConfig rather than update it in process-user-data. wdyt?

mkulke commented 3 weeks ago

@stevenhorsman @mkulke @bpradipt @liudalibj We can handle agent-config.toml just like cdh.toml here https://github.com/confidential-containers/cloud-api-adaptor/blob/main/src/cloud-api-adaptor/pkg/adaptor/cloud/cloud.go#L258-L282 via cloudConfig rather than update it in process-user-data. wdyt?

in principle yes, if we assume the agent-config will be static and the same for all cases, we can generate it in code and don't attempt to update the file, that would be the cleaner approach.

mkulke commented 3 weeks ago

it will also be useful if we want to provision a registry auth file via user-data, we could set the required kata-agent config option in the same file.

https://github.com/confidential-containers/cloud-api-adaptor/pull/1850#pullrequestreview-2090837084