confidential-containers / operator

Operator to deploy confidential containers runtime
Apache License 2.0
109 stars 58 forks source link

SEV: Export the cert_chain on install #81

Open fitzthum opened 2 years ago

fitzthum commented 2 years ago

For CC with SEV, the shim expects to find the full cert_chain at /opt/sev/cert_chain.cert. This only needs to be populated once per machine. Currently we expect the user to do it, but in the future maybe the operator can check for the cert and install it if isn't there yet.

To do so, install sevctl and then sudo mkdir /opt/sev and sudo sevctl export --full /opt/sev/cert_chain.cert

Hopefully we have a nice way to add custom steps for certain runtime classes.

cc: @alex-carter01 @ryansavino

bpradipt commented 2 years ago

@fitzthum can this step be done by kata-deploy.sh (part of the payload image) as part of install workflow ? Since the runtimeClass configs are made available by the payload, it might be more natural to have any related steps part of the payload image itself.

fitzthum commented 2 years ago

can this step be done by kata-deploy.sh (part of the payload image) as part of install workflow

Hm might be tricky to do it from inside a container, but I will look into this.

bpradipt commented 2 years ago

I tried running sevctl as a container and could export the certchain. Catch is the container needs to be privileged to gain access to /dev/sev

Used the following Dockerfile to build the image

FROM fedora
RUN dnf install -y sevctl
ENTRYPOINT ["sevctl"]

Today we are running the both the pre-install and install containers as a privileged container and we can include sevctl in the preinstall image. Having said that we'll need to re-think whether we really need privileged containers for the operator going forward.

surajssd commented 1 year ago

@bpradipt can I start looking into this?

bpradipt commented 1 year ago

@surajssd please feel free.. Sorry I missed the notification

fitzthum commented 1 year ago

So there was some discussion in the meeting about whether the operator is the right place to do this. Using the preinstall container may sidestep some of the concerns. I think that in general it is a good idea. In some ways provisioning the cert chain seems like a non-CoCo specific step. If a node has already been used to host VMs, it's likely that the cert chain was already provisioned somewhere. That said, there is no standardization here. CoCo expects to find the cert chain in a particular place on the filesystem. A host might not store the cert chain on the node at all (you can request them remotely as well). Since we have fairly specific requirements for setup, it seems reasonable to take care of it for the host.

Now there is one little snag for SNP, which is that with SNP the certificate chain is versioned and there are some cases where you might not want to automatically detect the version from the host. In some cases the administrator might want to override the tcb version of a node. This will make the operator's job a bit more complicated, but tbh we can probably ignore this feature for now and just get the version automatically (this is was sevtool does).

dcmiddle commented 1 year ago

On other TEEs I think it's the case that the host owner provisions and maintains the TEE and its TCB. CoCo should be able to assume that the host is correctly provisioned. In this case if there's not a standard place for the cert_chain, maybe the shim could look in multiple places?

fidencio commented 1 year ago

Yep, I'm with @dcmiddle on this.

fitzthum commented 1 year ago

I don't think it's unreasonable for us to assume that the machine is already provisioned. This is what we do today, but we can't have the shim look multiple places for the cert_chain because there are infinite places it could be. There is no standard at all for where it should live and like I said it doesn't even need to be on the worker node. It can also be extracted by a verifier. There are definitely some provisioning steps that we shouldn't mess around with, like enablement of confidential computing itself, but I"m not sure if this falls into this category. If we leave it to the user it will basically just be a weird step that they have to do before installing the operator, but probably would not have done otherwise.

surajssd commented 1 year ago

It seems currently it is not possible to get a certificate exported for SNP using sevctl, we either need support of SNP in the tool or a new tool. Currently it is being worked on. cc: @larrydewey

For SEV & SEV-ES we can do that using the tool. Do we still want to continue doing this for SEV & SEV-ES?

wainersm commented 1 year ago

Sorry to jump to late on this discussion, but I just saw @surajssd 's pull request. I'm not sure about the solution adopted and I'd like to propose an alternative.

okay, my problem with the current solution is that the pre-install container can get overly complex very quickly. We will end up with a spaghetti code if it needs to deal with host/TEE-specific setups.

I'm not expert on k8s operators but I believe it is possible to have relationships between them. For example, one operator depending on another, another triggering operations on dependent...etc. That said, what if we get an "operator-sev" (operator-tdx...etc) for dealing any TEE specific setups (e.g. cert setup on hosts) which is beyond the scope of the generic CoCo operator? And this operator can somehow cooperate with the generic CoCo operator to manage CoCo on the given cluster.

Cc @bpradipt @fitzthum @dcmiddle @fidencio

fidencio commented 1 year ago

I am still not convinced this is the right place for this operation.

We're mixing up the basic machine configuration, which is something we discussed a few times that the Operator should not be doing, but rather leave this to tools coming from the silicon vendor, with getting what we need for confidential containers to work setup.

fitzthum commented 1 year ago

Ok, I want to make it clear that expoting the cert chain is not part of generic SEV setup. There are a handful of operations that are required to setup any SEV guest, such as installing the correct kernel, fiddling with the BIOS, etc. These are out of the scope of the operator. Exporting the cert chain is not in this category. For one thing, it is not required at all to run or attest a VM. If someone has setup an SEV host, there is absolutely no guarantee that they have exported the cert chain and it is even less likely that they happened to export it to the location that we expect.

The certificate chain is required to validate an attestation. The certificate chain is retrieved from the AMD KDS. To get the certificate chain, which is not secret, anyone can query the KDS with the chip id of a machine. So it is a valid attestation flow to have the certificate chain be pulled by the KBS/verifier and never exist on the worker node. We expect the cert chain to be stored on the host to reduce the number of calls to the KDS.

So like I said, we should not think of this as a generic setup step. Today if people want to use CoCo with SEV, they must manually ensure that the cert chain has been exported to a particular location on the host. I think it is reasonable to have the operator do this instead.

I think there are some questions about the best way to implement, but I will ask those on the PR itself.