Set initdata in PeerPod

huoqifeng commented 6 days ago

This story follows up https://github.com/kata-containers/kata-containers/issues/9468 in cloud-api-adaptor for PeerPod

The initdata schema was discussed, agreed and documented in https://github.com/confidential-containers/trustee/blob/main/kbs/docs/initdata.md, a possible approach is documented in https://github.com/confidential-containers/confidential-containers/issues/171#issuecomment-1922821565

I drew a new diagram to explain more details for https://github.com/confidential-containers/confidential-containers/issues/171#issuecomment-1922821565 In PeerPod, as below:

The flow:

initdata added in pod descriptor as annotation, example:

{
"algorithm": "sha384",
"version": "0.1.0",
"data": {
    "aa.toml": "xxx",
    "cdh.toml": "xxx",
    "certificate.crt": "xxxx",
    "policy.rego": "xxxx",
}
}

The annotation passed from Containerd/CRI-O to kata-runtime
Generate the initdata files for kata-cc and kata-remote-cc separately
- 3.1 for kata-cc, the initdata is wrote in pod's specific folder on host
- 3.2 for kata-remote-cc, the initdata passed from kata-runtime to cloud-api-adaptor, cloud-api-adaptor write it to iso or meta-data service.
Mount the initdata files for kata-cc and kata-remote-cc separately
- 4.1 for kata-cc, the initdata file will be mounted like via 9p.
- 4.2 for kata-remote-cc, the initdata can be mounted as iso device like libvirt provider
- 4.3 for kata-remote-cc, the initdata can also be retrieved from meta-data service like azure provider
initdata format after mount Initdata have formats similar as cloud-init yaml but with "algorithm": "sha384" and "version": "0.1.0" added, so that we can calculate hash based on specific algorithm
Provision initdata
- for kata-remote-cc, use process-user-data to provision the configurations, like aa.toml, cdh.toml
- for kata-cc, a similar process can be launched by init to do similar but it's not necessary to be a systemd service, it could be a program/scripts in initrd and launched by init
Launch aa, cdh and asr etc
- Initdata's hash will be added into the evidence
- Do remote attestation and check the initdata hash on verifier.

I'm expecting only cloud-api-adaptor change for kata-remote-cc for this approach. Might need some more change in kata-runtime for kata-cc for this approach. cc @Xynnn007 @mkulke @stevenhorsman @fitzthum

Xynnn007 commented 6 days ago

Wow. Cool diagram, and it is nearly the final shape in my mind.

One thing not clear to me is who processes the initdata toml inside guest?

Another thing to note that, for kata-cc now @ChengyuZhu6 and I are trying to use RPC to inject the initdata into kata-agent directly (under testing) to firstly make it work-able. And the next step would be to think about whether to directly mount the initdata file into guest to a specific path.

If peer-pod is trying to directly put the initdata file into guest, I wonder on kata-cc side, we could directly start discussing and evaluating the way of share the file into guest rather than set with rpc. As the two would meet nearly the same threat model and attack vector.

I'd like to hear more ideas. @fidencio might also be interested in this topic

mkulke commented 6 days ago

@Xynnn007

If peer-pod is trying to directly put the initdata file into guest, I wonder on kata-cc side, we could directly start discussing and evaluating the way of share the file into guest rather than set with rpc. As the two would meet nearly the same threat model and attack vector

yes, I think this is the key difference from the existing proposal. Instead of using RPC does it make sense for kata to consume untrusted data from the host (9p or otherwise, in theory it could also be read-only iso like the cloud-config disk for libvirt or metadata over vsock).

I assume this has been considered and the RPC approach was preferred, but I don't know really. I can think of one upside when using RPC: dynamic configuration. since you have to pass through the agent to provision files, you can enforce a runtime measurement. Not sure if this is a desirable scenario, but a shared-by-host disk could be read only once.

A potential downside of an RPC approach: We'd have to include a ttrpc client in the agent to talk to AA which is quite invasive and litters the agent code with more coco-specifics. A peerpod-only approach for extending evidence with an initdata hash could be to hook into a SetInitData endpoint in agent-protocol-forwarder. However, for validating HOSTDATA (see below) you'd need to get the attestation report, probably also via attestation-agent RPC endpoint (GetEvidence).

@huoqifeng

re 7)

Initdata's hash will be added into the evidence

Some TEEs like SNP (w/o vTPM) cannot add something (extend) the evidence. what they do instead: hash the init-data and add it to guest launch as HOSTDATA, which will be part of the signed evidence. the agent then has to assert that hash(init-data) == HOSTDATA to bind the init-data to the TEE.

mkulke commented 6 days ago

For context, since cloud-init is (too) pervasive and cannot work on a read-only rootfs, @bpradipt wrote process-user-data for peerpods, which will just process select write_files entries from cloud-config yamls.

Xynnn007 commented 6 days ago

I assume this has been considered and the RPC approach was preferred, but I don't know really. I can think of one upside when using RPC: dynamic configuration. since you have to pass through the agent to provision files, you can enforce a runtime measurement. Not sure if this is a desirable scenario, but a shared-by-host disk could be read only once.

Yes.I once discussed with @fitzthum and thought the first goal would be "to make it work without changing too many things". But seeing that peerpod is now running quickly to embrace read-only files directly, I begin to wonder whether it is necessary to use RPC on kata first, and then consider using files. Instead, use the file directly (this is cleaner and avoids AA having a strange UpdateConfiguration interface. Also hints that a PR that deletes currently SetPolicy API, but put policies together with others initdata items as a read-only file inside a determined path of guest. This path should be the same as peer-pods I think).

Initdata also supports passing policy, so I would also like to hear @danmihai1 's idea.

huoqifeng commented 6 days ago

Some TEEs like SNP (w/o vTPM) cannot add something (extend) the evidence. what they do instead: hash the init-data and add it to guest launch as HOSTDATA, which will be part of the signed evidence. the agent then has to assert that hash(init-data) == HOSTDATA to bind the init-data to the TEE.

@mkulke could you help please explain more about this? why RPC can resolve the problem but hostpath mount could not?

mkulke commented 6 days ago

I am PoC'ing the RPC approach (using only SetPolicy) at the moment, to understand structural implications better.

doing that it feels like there's a lot of ceremony and boilerplate required to add an AA client to the agent... that is in comparison to some host-injected read-only datablob that can be measured into the evidence while kata-agent can be left untouched.

mkulke commented 6 days ago

Some TEEs like SNP (w/o vTPM) cannot add something (extend) the evidence. what they do instead: hash the init-data and add it to guest launch as HOSTDATA, which will be part of the signed evidence. the agent then has to assert that hash(init-data) == HOSTDATA to bind the init-data to the TEE.

@mkulke could you help please explain more about this? why RPC can resolve the problem but hostpath mount could not?

to perform this assertion, you need to fetch an attestation-report. this is done on the guest. The report cannot be injected as part of untrusted host-injected data.

This doesn't really have something to do with RPC, but it happens on the guest. in theory any guest-component (something like process-user-data) could do that (hash files; fetch report; perform HOSTDATA check).

Xynnn007 commented 6 days ago

I am PoC'ing the RPC approach (using only SetPolicy) at the moment, to understand structural implications better.

doing that it feels like there's a lot of ceremony and boilerplate required to add an AA client to the agent... that is in comparison to some host-injected read-only datablob that can be measured into the evidence while kata-agent can be left untouched.

Yes. it also makes trouble for launching CDH process, because only after SetInitData is received by kata-agent, CDH can get its configuration and then be launched. This is what Chengyu and I are facing when debugging. If we have a systemd service like process-user-data that parses read-only file and place configurations & policies, then AA, CDH, Kata-agent are launched, things would get much simpler.

huoqifeng commented 6 days ago

Yes. it also makes trouble for launching CDH process, because only after SetInitData is received by kata-agent, CDH can get its configuration and then be launched. This is what Chengyu and I are facing when debugging. If we have a systemd service like process-user-data that parses read-only file and place configurations & policies, then AA, CDH, Kata-agent are launched, things would get much simpler.

PeerPod is using systemd service, I think it's not necessary to be a systemd service for kata-cc if systemd is not enabled. It could just be a program/script in initrd that can be launched by init.

mkulke commented 6 days ago

Yes. it also makes trouble for launching CDH process, because only after SetInitData is received by kata-agent, CDH can get its configuration and then be launched. This is what Chengyu and I are facing when debugging. If we have a systemd service like process-user-data that parses read-only file and place configurations & policies, then AA, CDH, Kata-agent are launched, things would get much simpler.

right, sometimes problems in a scheme only become apparent when prototyping an implementation. RPC makes things a bit more complex than I initially assumed. there is a bootstrapping problem also with AA:

We want to receive an AA config via Agent::SetInitData() To bind the InitData to the TEE Evidence we need to either call AA::ExtendRuntimeMeasurement() or AA::GetEvidence() We need to have an AA running to call those endpoints

Options:

launch a half-initialized AA, without the token config, restart AA/reread conf after SetInitData. 😐
include the AA::attester crate as a lib into the agent to call extend_runtime_measurement() or get_evidence() directly. 🤔
Inject a static init-data data blob into the guest. have a one-off guest-component processing this at launch, using the AA:attester crate to call extend_runtime_measurement() or get_evidence() to bind it to the TEE evidence. 🤔 🤔

huoqifeng commented 6 days ago

right, sometimes problems in a scheme only become apparent when prototyping an implementation. RPC makes things a bit more complex than I initially assumed. there is a bootstrapping problem also with AA:

So, use process-user-data to set initdata sounds like a more straightforward approach and make the API/Implementation more easier. Shall we adopt this approach if it can also bind the HOSTDATA for SNP (w/o vTPM)?

fitzthum commented 5 days ago

A few things here.

First, one reason we talked about sending the config via the Kata Agent API is to minimize TCB. There has been a lot of talk about disabling fs passthrough. It's not clear exactly where that will go, but it's something to think about. The RPC option is also more similar to what Dan initially implemented.

I don't really have a preference here but we should think about it carefully. One interesting advantage of decoupling this from the Kata Agent is that it could allow us to use init-data to provision configs for the kata agent. It would probably have implications on how we startup the components in the guest, though.

We have already introduced a check_init_data endpoint to the attestation agent. This should be called by whoever processes the init-data. Note that the kata agent is going to be calling the CDH for things anyway. check_init_data is designed to be run with the AA in a partially configured state (i.e. the KBS URI has not been provided yet but isn't needed). This means that we also have a somewhat weird flow to update the AA config from the init-data.

The attestation agent also has an extend_runtime_measurement endpoint that should be usable in the partially configured state. I guess the caller will have to decide between this and check_init_data depending on what platform they are on, which isn't totally ideal.

Xynnn007 commented 5 days ago

Nice point. One way from my side is to move check_init_data API from RPC to launch parameter --initdata of AA. The logic inside AA would determine whether to check initdata field, or extend_runtime_measurement due to platform. This hints that a common process-user-data would pre-process initdata toml into different config/policy files from iso device/cloud metadata service/sharedfs or something.

The key point is the danger of fs passthrough. Are there any details about the talk? A naive idea is that if the dir is read-only, also the guest would never write anything to the shared path, what security flaw else would come?

mkulke commented 5 days ago

The key point is the danger of fs passthrough. Are there any details about the talk? A naive idea is that if the dir is read-only, also the guest would never write anything to the shared path, what security flaw else would come?

I wonder whether consuming init-data from metadata svc, cloud-config iso, agent::SetInitData endpoint are roughly equivalent: in all scenarios we are processing untrusted data from the host in a pre-attested guest environment.

mkulke commented 5 days ago

check_init_data is designed to be run with the AA in a partially configured state (i.e. the KBS URI has not been provided yet but isn't needed). This means that we also have a somewhat weird flow to update the AA config from the init-data.

I assume the reason for having extend_runtime_measurement and check_init_data in attestation-agent is to have a common interface across TEEs. albeit, as you pointed out, the decision to call "extend" or "check" is already tee-specific, so this is abstraction is a bit leaky. maybe we can refine it into a common abstraction.

I understand that's what @Xynnn007's suggestion to use --init-data aims to achieve. Would this be a one-off exec that is called from the agent? (e.g. "attestation-agent --init-data "$INIT_DATA" that will extend/verify and then exit with an error code?)

If I understood it correctly. AA fulfills 2 tasks:

abstracting over HW-interfaces of the TEE ("get_evidence", "extend_msrm", "verify_init_data")
RCAR negotiation and token mgmt with KBS/AS.

If we would use the aa:attester crate in agent, we'd still have the hw encapsulation (task 1) without the RPC. Is this an option?

We could avoid a "partially initialized state" of AA, which might just be accidental complexity. There is no state, the agent just needs to call a piece of code that happens to live in AA's codebase.

huoqifeng commented 5 days ago

I think init-data from agent::SetInitData and from metadata svc, cloud-config iso are all untrusted, we should always check it's integrity.

huoqifeng commented 5 days ago

I wrote a rough commit to illustrate the approach in PeerPod https://github.com/huoqifeng/cloud-api-adaptor/commit/8fa416d78a7244436c13ce6436eba0e11bebbcd4. It's not done yet but I think we can see the complexity from the commit.

fitzthum commented 4 days ago

The key point is the danger of fs passthrough. Are there any details about the talk? A naive idea is that if the dir is read-only, also the guest would never write anything to the shared path, what security flaw else would come?

I have done some experiments here. Currently it's pretty easy to construct an attack where encrypted container images are leaked through the shared fs. The agent doesn't validate where shared directories are mounted and there are a few ways to create bind mounts inside the guest that can map arbitrary locations (such as the directory where the images are unpacked) onto shared directories. We have done some work to set fs_share=none, but currently the host can just re-enable the option without changing the measurement.

Policy restrictions could be used here, but ideally we would be secure by default without requiring users to have any particular policy. There are also some potential bootstrapping issues if we use the policy to restrict mounts but we need a mount to share the policy. I think the ideal situation would be that the guest simply wouldn't be able to mount any shared directories. For instance, we might be able to disable this at the kernel level. Of course, this would probably break some other things.

One compromise might be to prevent the kata agent from mounting any shared directories but to have one readonly mount specified in fstab. Read-only directories are not guaranteed to be safe. For instance, the kata agent might be tricked into bind mounting /etc onto the shared directory. Then the host could overwrite the configuration of the guest. Nonetheless this might be a reasonable compromise.

fitzthum commented 4 days ago

If we would use the aa:attester crate in agent, we'd still have the hw encapsulation (task 1) without the RPC. Is this an option?

This is a really thoughtful suggestion. I think it's probably a good idea. One thing to think about is that we are currently missing a little bit of plumbing to send the init-data to Trustee. When the AA checks the init-data, it should keep track of it so that it can be sent with alongside the evidence. If we are using the AA both as a crate and later as a binary, we might need to store this on the fs somewhere.

Also, would this also work for peer pods or just baremetal?

mkulke commented 4 days ago

When the AA checks the init-data, it should keep track of it so that it can be sent with alongside the evidence.

oh, do we need to attach the full init-data body to the evidence? I assumed it was only the hash which should be HOSTDATA? (or measured inti a PCR, RTMR, ...)

fitzthum commented 4 days ago

oh, do we need to attach the full init-data body to the evidence? I assumed it was only the hash which should be HOSTDATA? (or measured inti a PCR, RTMR, ...)

Trustee is designed to use either one, but one of the ideas behind init-data is that the KBS policy will be able to read the init-data plaintext and make some choices based on it. Ofc if the init-data gets really big (for instance if there is a really big policy file) then this could be a bit clunky. So we're not totally set on this, but we should at least be able to send the plaintext over.

mkulke commented 4 days ago

but we should at least be able to send the plaintext over.

Ok, thanks for this context, I wasn't aware. This does indeed imply that state is kept in an AA daemon. There are alternatives, but they might be clunky, I agree. (init_data_path=/run/... in aa.toml, so AA::GetEvidence() can pick it up, hmm)

arronwy commented 3 days ago

The key point is the danger of fs passthrough. Are there any details about the talk? A naive idea is that if the dir is read-only, also the guest would never write anything to the shared path, what security flaw else would come?

I have done some experiments here. Currently it's pretty easy to construct an attack where encrypted container images are leaked through the shared fs. The agent doesn't validate where shared directories are mounted and there are a few ways to create bind mounts inside the guest that can map arbitrary locations (such as the directory where the images are unpacked) onto shared directories. We have done some work to set fs_share=none, but currently the host can just re-enable the option without changing the measurement.

We may need disable virtio-fs/9p in guest kernel which will impact the measurement.

arronwy commented 3 days ago

After sync with @Xynnn007 , we found since the integrity of initdata passed from host side is ensured by TEE HW register, we can send initdata to guest with below four options: shared fs/shared volume/RPC/network.

But from security point of view as @fitzthum mention, shared fs (virtio-9p/fs) is disabled for enhanced security,

RPC way may need a new service daemon or reuse kata-agent which block the init config for kata-agent it self.

Network way may need an dedicated metadata-service as current peer pod use.

The shared volume way maybe a ideal option, it is easy to use, containerd-shim-kata-v2 will create a initdata block device and share it to guest through virtio-blk, inside guest, we can use systemd service mount this device to /mn/init_data directory before kata-agent is launched and for other service daemon to consume, we can write a separate program to parse the init data or depends on kata-agent to do the parse job.

mkulke commented 3 days ago

One thing to consider, which we briefly discussed on the peerpod community call would be size-limits of user-data bodies: AWS, Azure, GCP all have various limits here, varying between 16kb - 256kb. I'm not sure whether that's prohibitive. a cert is alone is >1kb, a policy body could maybe be too large (when autogenerated from a complex k8s manifest).

confidential-containers / cloud-api-adaptor

Set initdata in PeerPod #1895