confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach

Apache License 2.0

48 stars 88 forks source link

RFC: Remove cloud-config / cloud-init from PodVM Image? #1467

Open mkulke opened 1 year ago

mkulke commented 1 year ago

Problem Statement

Currently, cloud-config is running at VM startup to configure the system.

How we use it

If DISABLE_CLOUD_CONFIG is unset, we populate the fs with configuration files using the write_files directive.
We use it to start CSP-specific agents, like Azure's waagent.
Potentially network configuration

Why we might not want to use it

Cloud Config is a venue to execute arbitrary untrusted code at startup.

Alternative Solutions

Measure cloud-config data at early boot stages, prior to processing.

Open Problems

I'm not sure whether alternatives to cloud-config are available. I think in some cases a virtual iso is attached and we'd need to support that one as a source in addition to process-user-data.

katexochen commented 1 year ago

I agree we should remove cloud init. We have some hacks in Constellation to run without waagent.

mkulke commented 1 year ago

Yes, that's possible and documented as a valid agent-less deployment now. The support situation for those VMs might be tricky for some deployments. So I'm not sure what e.g. the plans for RHEL CVMs & Cloud-Config are, since I suspect Cloud-Config in a TEE as problematic for all flavors of CVMs, not just PeerPod PodVMs.

FWIW, I've been running tests using a discrete report-ready tool. PodVMs start reliably without cloud-config.

There might be CAA use cases for Cloud Config which we need to cover (or reconsider). Admin users and SSH pubkeys, configured in CAA, are currently provisioned via Cloud-Config to the PodVM. We could certainly support that in some way, but we don't want to rewrite cloud-config probably.

katexochen commented 1 year ago

The support situation for those VMs might be tricky for some deployments. So I'm not sure what e.g. the plans for RHEL CVMs & Cloud-Config are, since I suspect Cloud-Config in a TEE as problematic for all flavors of CVMs, not just PeerPod PodVMs.

@bpradipt what is your opinion in this regard?

There might be CAA use cases for Cloud Config which we need to cover (or reconsider). Admin users and SSH pubkeys, configured in CAA, are currently provisioned via Cloud-Config to the PodVM. We could certainly support that in some way, but we don't want to rewrite cloud-config probably.

I think cloud config will be really hard to measure in a meaningful way, so I don't see much future for it. Maybe we have to reimplement things, but then we can at least have a way to implement validation. SSH access to podvms will become less interesting as soon as the base image is read-only I guess (and SSH access doesn't make much sense in a CC context anyway IMO, as it makes remote attestation essentially meaningless).

mkulke commented 1 year ago

fwiw, I'm also looking at afterburn atm, which would provide an option to provision the SSH keys, but I agree in perspective running SSH in the TEE is questionable.

katexochen commented 1 year ago

I'm also looking at afterburn atm, which would provide an option to provision the SSH keys

Hm, I saw your discussion on the PR. I'm not sure about it. From my perspective, a solution for this should be minimal, like the tool you proposed. afterburn seems to do a little bit to much things (like the SSH key stuff). We don't want to end with an alternative cloud init.

mkulke commented 1 year ago

yeah, it's pretty large, a release build is 166mb and a statically linked even bigger. but it also performs this job on multiple clouds, so I'm leaning towards adopting this. Maybe they're open to featurization of this crate.

Although the surface might be large the handling seems to be rather explicit, so not open-ended like cloud-config. you simply invoke ./afterburn --provider azure --check-in and it shouldn't do anything beyond that.

jepio commented 1 year ago

afterburn seems to do a little bit to much things (like the SSH key stuff). We don't want to end with an alternative cloud init.

It's all driven by cli arguments, so it doesn't do this extra stuff if you don't want it to.

katexochen commented 1 year ago

Afterburn doesn't seem to have a mechanism to detect the cloud provider. Their docs are pretty thin.

As far as I can see, we currently only need a check-in on Azure. I will add a systemd unit that has Azure hard-coded as provider and add a || true to the exec so it won't fail on other CSPs. If we require checkin on different cloud providers, we will need a script that wraps afterburn to detect the provider. I don't want to pass the provider via kernel command line, as that would mean to build provider specific images.

katexochen commented 12 months ago

Reopening as there are still things to do:

[ ] Implement process-user-data for IBMCloud
[ ] Find a solution to provide metadata in libvirt/vmware

confidential-containers / cloud-api-adaptor

RFC: Remove cloud-config / cloud-init from PodVM Image? #1467

Problem Statement

How we use it

Why we might not want to use it

Suggested Solution

Alternative Solutions

Open Problems