Make pod vms generic, moving the code to containers

surajssd commented 1 year ago

Problem

Right now there is a trend to move more and more code/binaries to the pod VM. This trend is slowing us down with respect to developer productivity or rapid development iterations. In the world of containers and Kubernetes this feels like an anti-pattern.

Proposed Solution

I propose that instead of baking in the binaries and its dependencies into the pod vm, we create container images for each of these specialized applications. The attestation agent, kbc, (eventually) confidential data hub be shipped as container images.

We have started seeing the problems with conflicting dependencies like the current famous one where the default AA needs ttrpc and the AA with Azure vTPM has to be built with grpc.

Note: The policy mechanism will act as the gatekeeper on what container images can be pulled. Gatekeeping of these images happen using container image and immutable hash (instead of a tag which is mutable). For development purposes of course we can give mutable tags.

Deployment Methodologies

Now these conatainers can be deployed in two ways:

Admission Webhook Injection: Use some kind of admission webhook to inject these sidecars / init-containers into the pod config.
Kata-Agent Injection: Do this injection at the kata-agent level when the OCI config is being provided to runc. This is how infrastructure containers like pause container is injected.

Admission Webhook Injection

Pros:

Fully transparent to the user of what is really being run along side their workloads.
They can control if they want this behaviour with as simple as adding/removing certain labels/annotations.
Better management of the long-running coco infra containers like AA. Right now the AA process is started by kata-agent and then there is no management of this process. For CAA at least we have a systemd service file, but for non-CAA stuff we have no way to manage the process. With the sidecar deployment model k8s can manage the lifecycle of these containers.

Cons:

It is a mess for a user to see so many non-business logic related stuff alongside their apps. Hence making things hard to debug (but this is not an anti-patter a lot of apps do this like Envoy, Cert-manager, etc.).
Some of the containers may have to run as privileged containers (like the attestation agent) which may not be allowed in some environments. Also in some cases privileged containers might leak the podvm file system. Again policy work can mitigate this problem.

Kata-Agent Injection

Pros:

Similar to pause container we can inject the other coco infrastructure containers like AA, CDH, etc. So the config that users see at the k8s level is not messy.
No need for the user to run privileged containers.

Cons:

The user has lesser control since all the deployment decisions happen inside the kata-agent hence less flexible.

surajssd commented 1 year ago

cc: @mkulke @bpradipt @stevenhorsman @iaguis @jepio

bpradipt commented 1 year ago

Thanks for putting this together @surajssd

Should we put additional focus on clearly delineating components that must be run in the system context vs which can be deployed (as sidecars) with the pod as part of this issue itself?

mkulke commented 1 year ago

Thanks for taking the initiative on that, I agree with @bpradipt it makes sense to tie this to the concrete CAA components to probe the implications. Spontaneous thoughts:

Re: Amount of code in Pod VM

Right now there is a trend to move more and more code/binaries to the pod VM

I think that's right, but we're probably doing it wrong/for pragmatic reasons. In principle we don't want to bloat the software surface in the TEE beyond what's necessary.

Re: attestation-agent

Right now the AA process is started by kata-agent and then there is no management of this process.

I'd agree that this is suboptimal, however CAA being a downstream consumer of kata-agent, we probably need to address it in kata-containers.

With kata-agent being pid0 of a guest vm in a non-caa scenario, there is no process manager to manage attestation-agent. Arguably, kata-agent spawning attestation-agent as a process is incidental, mandated by the ocicrypt/-rs keyprovider architecture. It could, with some effort, probably be refactored to work-in process or even be invoked as one-off cmd per layer.

Affected components

In principle, beyond what we inherit from kata, we really only need 1 additional process: agent-protocol-forwarder. It is somewhat like the "pid0" of a pod vm. It will facilitate communication with the remote-hypervisor impl on k8s and proxy calls to the local kata-agent.

Due to this, it's a bad candidate for containerization.

Additional components we require, e.g. for runtime key release we could aim to cover with containers. However also here I would caution to not deviate from kata, since non-caa kata has the same issues.

Re: container Injection:

agent-protocol-forwarder already modifies the container spec in-flight by adding a network namespace, it could be used for this purpose, maybe? Not sure if this is a good practice, it would be specific to CAA.

stevenhorsman commented 1 year ago

Right now the AA process is started by kata-agent and then there is no management of this process.

I'd agree that this is suboptimal, however CAA being a downstream consumer of kata-agent, we probably need to address it in kata-containers.

I have raised this in the community and asked if we can switch the attestation-agent and/or the confidential data hub that might replace some of the current attestation-agent functions to be a systemd service rather than spawned by kata-agent. The current blocker for changing this right away is that the SEV/SNP initrd doesn't use systemd, so we'd need that agreement, but with the CDH it feels like the right time to have this discussion as there will be some refactoring anyway.

surajssd commented 1 year ago

In principle we don't want to bloat the software surface in the TEE beyond what's necessary.

We can ship these binaries in scratch based container to stop the OS from bloating.

surajssd commented 1 year ago

Trying to run the binaries as is inside the OS mandates we install the dependencies on the OS, regardless of how we invoke them.

We are already facing problems of conflicting dependencies.

mkulke commented 1 year ago

Trying to run the binaries as is inside the OS mandates we install the dependencies on the OS, regardless of how we invoke them.

We are already facing problems of conflicting dependencies.

that's true. It wouldbe helpful for e.g platform-dependent attester code. could an attestation-agent container be injected by kata-agent and still be used to decrypt layers?

surajssd commented 1 year ago

In theory, yes.

I think this is a phased work. First phase we solve problem of shipping binaries inside container images. They will still be invoked in the current fashion. Where the images are still not generic per se. We hardcode the image name / tag into the code in some way.

In the second stage we move to the approach of making the podvm image truely generic, but then somehow kata-agent or something has to do the gatekeeping of which images to download. So I think for this issue let's scope it down to containerization of the components. And right now the only component that we ship is attestation agent.

mkulke commented 1 year ago

To address the developer velocity and packaging problems would this be a viable approach?

1) Build (and release) the components (kata-agent, agent-protocol-forwarder, attestation-agent) as containers. 2) Prefetch a set of canonical default images in the podvm during image build 3) Start the processes as containers via systemd (podman, nspawn, ...) units

To iterate quickly on individual components there could be a workflow like this:

1) Make the components configurable in peer-pod-cm config-map (something like images.agent-protocol-forwarder = ghcr.io/bla/apf:my-test) 2) If specified, cloud-api-adaptor passes this information to the instance via fields in the bootstrapping user-data 3) The respective systemd unit will pull and run the specified image instead of the default, prefetched local one.

bpradipt commented 1 year ago

Should we start by splitting the payload (binaries) container image into individual images for kata-agent and attestation-agent or still continue to build a single payload container image and include agent-protocol-forwarder into it ?

surajssd commented 1 year ago

We continue to do what we do for kata-agent and agent-protocol-forwarder, but it is only AA that we separate out, isn't it?

mkulke commented 1 year ago

we could do the same with kata-agent and agent-protocol-forwarder, I guess?

BryceDFisher commented 1 year ago

I am thinking that the customer could potentially list the AA container as the first init container. The kata-agent could pass the appropriate arguments to the container to fetch the initial image decryption keys used to start the rest of the init containers and workload containers. If necessary, the AA container could be started again as a workload container to perform runtime attestation as necessary.

confidential-containers / cloud-api-adaptor