confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
47 stars 81 forks source link

RFC: Spec for desired IMDS (Metadata Service) behaviour #1060

Open mkulke opened 1 year ago

mkulke commented 1 year ago

IMDS (Metadata Service) Behaviour in Cloud Api Adapter

relates to #1000 #1048

Context

IMDS

IMDS or the Metadata Service is a special endpoint (usually at 169.254.169.254) exposed to VM Instances on Azure, AWS , Openstack and possibly more cloud infra.

In brief it provides metadata about itself to the instance, which might be required to configure/bootstrap the instance (e.g., via "user-data", "cloud-init", that has been specified at launch). It also might host IAM credentials scoped to an instance's profile/role/managed identity.

Ubiquitous cloud SDKs (e,g. boto3, azure-sdk-for-python) internally leverage those endpoints to perform privileged operations (like downloading a cvs file from a blob store). In many cases a user might not be aware of the IMDS facility, while the workload still depends on it.

Network segmentation in Cloud Api Adapter

At the moment the pod is assigned its own network namespace (podns) which is linked to a corresponding k8s node ("peer node") via vxlan tunneling. This is desired and has several implications: an (unprivileged) pod does not have access to the networking facilities of its hosting pod vm and pod traffic is routed through the peer node.

Remote Attestation

At pod runtime, Secure Key Release via Attestation-Agent and KBS breaks the implicit assumption that we are able to comprehensively isolate a pod from its hosting pod vm. In remote attestation, collecting attestation evidence usually requires access to some host resources like devices (/dev/sev-guest, /dev/tpm0, ...) or the IMDS endpoint. Currently we provide access to pod vm resources via GRPC in a dedicated Attestation Agent process spawned in podns.

Problem Statement

Azure's az-snp-vtpm attester also interacts with an instance's IMDS to retrieve the Azure CVM's platform's VCEK . The attester code calls out to the IMDS ip (169.254.169.254) and reaches the peer node's IMDS, which is either no CVM (the call fails) or even the wrong CVM (wrong certificate).

We currently opted to re-route IMDS traffic in a pod's podns to the hosting pod vm's IMDS. This has potential to break existing code that relies on Pods having access to the metadata service in subtle hard-to-debug ways. The premise of lift-and-shift workloads into TEE is not kept.

Potential Solutions

Keep and document

Document this behaviour as a caveat to users, possibly as part of a dedicated section on networking.

Route IMDS traffic through the node:

Undo the current workaround and return to the original behaviour. For Azure CVM's the VCEK can be buffered as a file at podvm startup. The az-snp-vtpm attester code in attestation-agent needs to be adjusted to support reading the VCEK from a file.

mkulke commented 1 year ago

cc @katexochen

bpradipt commented 1 year ago

@mkulke I'm wondering how the policy frameworks that are being discussed as part of CoCo impact this. Can policy help with deciding what queries the agent can make when connecting to the pod VM IMDS ? For example only allowed queries will be for VCEK and identity and nothing else.

mkulke commented 1 year ago

@mkulke I'm wondering how the policy frameworks that are being discussed as part of CoCo impact this. Can policy help with deciding what queries the agent can make when connecting to the pod VM IMDS ? For example only allowed queries will be for VCEK and identity and nothing else.

Not sure that's a good fit:

I feel like this should be something that would be dealt with on the podvm layer in front of kata-agent, i.e. on a similar level as the protocol-forwarder. So in theory what we're doing now is ok, IMO, but it's too unspecific.

We don't need and want to provide generic IMDS access in the pod network namespace, but we require access to a specific endpoint only available on Azure SNP CVMs for attestation reasons. If this wouldn't be proxy-arp, but a userland l7 proxy http://$something/VCEK set up as a systemd service on the podvm that would be close to the target architecture that was sketched out for Pod => KBS access.

So, for the short/medium-term, I planned to work on a PR to retrieve and buffer the VCEK on the podvm (on the filesystem) and similarly on attestation-agent to use this file instead of IMDS. This should not be very invasive and would allow us to remove the proxy-arp rule.

A small issue that got me thinking, though: We are currently looking into TDX and IMDS would also be part of the attestation logic, albeit it's required in an interactive fashion, you cannot fetch something beforehand in a systemd unit, like the VCEK.

Considering this I see the following options:

WDYT?

katexochen commented 1 year ago

Fine to keep it as is for me. We can still do the second option if we encounter other problems with the current solution or the plans regarding an AA solution for it change/ take longer as expected.

ariel-adam commented 1 year ago

Gents, is this on track for 0.7.0 (feature freeze 12th of July) or should we move it to 0.8.0 (it's still in the new column in our board https://github.com/orgs/confidential-containers/projects/6/views/17)?

mkulke commented 1 year ago

In the last community meeting an architecture was discusse to provide around a tactical/stopgap" solution to address secret retrieval at runtime in peer pods:

Copy of Untitled Diagram drawio

mkulke commented 1 year ago

Related: https://github.com/confidential-containers/confidential-containers/issues/136