confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
47 stars 78 forks source link

RFC: Remove cloud-config / cloud-init from PodVM Image? #1467

Open mkulke opened 12 months ago

mkulke commented 12 months ago

Problem Statement

Currently, cloud-config is running at VM startup to configure the system.

How we use it

Why we might not want to use it

Cloud Config is a venue to execute arbitrary untrusted code at startup.

Suggested Solution

Replace the required functionality (config file creation, cloud callback, network configuration) with systemd units. We started this work by providing an option to write config files by parsing user-data directly (DISABLE_CLOUD_CONFIG=true).

Alternative Solutions

Open Problems

I'm not sure whether alternatives to cloud-config are available. I think in some cases a virtual iso is attached and we'd need to support that one as a source in addition to process-user-data.

katexochen commented 11 months ago

I agree we should remove cloud init. We have some hacks in Constellation to run without waagent.

mkulke commented 11 months ago

Yes, that's possible and documented as a valid agent-less deployment now. The support situation for those VMs might be tricky for some deployments. So I'm not sure what e.g. the plans for RHEL CVMs & Cloud-Config are, since I suspect Cloud-Config in a TEE as problematic for all flavors of CVMs, not just PeerPod PodVMs.

FWIW, I've been running tests using a discrete report-ready tool. PodVMs start reliably without cloud-config.

There might be CAA use cases for Cloud Config which we need to cover (or reconsider). Admin users and SSH pubkeys, configured in CAA, are currently provisioned via Cloud-Config to the PodVM. We could certainly support that in some way, but we don't want to rewrite cloud-config probably.

katexochen commented 11 months ago

The support situation for those VMs might be tricky for some deployments. So I'm not sure what e.g. the plans for RHEL CVMs & Cloud-Config are, since I suspect Cloud-Config in a TEE as problematic for all flavors of CVMs, not just PeerPod PodVMs.

@bpradipt what is your opinion in this regard?

There might be CAA use cases for Cloud Config which we need to cover (or reconsider). Admin users and SSH pubkeys, configured in CAA, are currently provisioned via Cloud-Config to the PodVM. We could certainly support that in some way, but we don't want to rewrite cloud-config probably.

I think cloud config will be really hard to measure in a meaningful way, so I don't see much future for it. Maybe we have to reimplement things, but then we can at least have a way to implement validation. SSH access to podvms will become less interesting as soon as the base image is read-only I guess (and SSH access doesn't make much sense in a CC context anyway IMO, as it makes remote attestation essentially meaningless).

mkulke commented 11 months ago

fwiw, I'm also looking at afterburn atm, which would provide an option to provision the SSH keys, but I agree in perspective running SSH in the TEE is questionable.

katexochen commented 11 months ago

I'm also looking at afterburn atm, which would provide an option to provision the SSH keys

Hm, I saw your discussion on the PR. I'm not sure about it. From my perspective, a solution for this should be minimal, like the tool you proposed. afterburn seems to do a little bit to much things (like the SSH key stuff). We don't want to end with an alternative cloud init.

mkulke commented 11 months ago

yeah, it's pretty large, a release build is 166mb and a statically linked even bigger. but it also performs this job on multiple clouds, so I'm leaning towards adopting this. Maybe they're open to featurization of this crate.

Although the surface might be large the handling seems to be rather explicit, so not open-ended like cloud-config. you simply invoke ./afterburn --provider azure --check-in and it shouldn't do anything beyond that.

jepio commented 11 months ago

afterburn seems to do a little bit to much things (like the SSH key stuff). We don't want to end with an alternative cloud init.

It's all driven by cli arguments, so it doesn't do this extra stuff if you don't want it to.

katexochen commented 11 months ago

Afterburn doesn't seem to have a mechanism to detect the cloud provider. Their docs are pretty thin.

As far as I can see, we currently only need a check-in on Azure. I will add a systemd unit that has Azure hard-coded as provider and add a || true to the exec so it won't fail on other CSPs. If we require checkin on different cloud providers, we will need a script that wraps afterburn to detect the provider. I don't want to pass the provider via kernel command line, as that would mean to build provider specific images.

katexochen commented 10 months ago

Reopening as there are still things to do: