Open octaviansima opened 1 year ago
I just encountered the same issue in IBM Cloud. I am investigating it now.
In my case, I built the pod VM image based on a wrong kata-containers branch. So, the kata-agent included in my pod VM image did not support peer pods. After rebuilding the pod VM image with the Kata Containers CCv0
branch, the problem is gone.
@octaviansima How did you prepare your Pod VM image and set AZURE_IMAGE_ID
?
https://github.com/confidential-containers/cloud-api-adaptor/tree/main/azure#build-pod-vm-image
The following error typically indicates wrong kata-agent without the code for image pull. So looks like the kata-agent is not built from the CCv0 branch
code = InvalidArgument desc = grpc.Image service does not exist
Thanks for the help, using that branch fixed that error but now I'm running into
Warning FailedCreatePodSandBox 22m kubelet
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task:
remote hypervisor call failed: rpc error: code = Unknown desc = creating an instance : parsing pod node IP "10.224.0.5": %!w(<nil>): unknown
This repeats indefinitely as K8s attempts to launch another pod until I hit a VM quota limit on Azure
Looks like the code responsible for that error message was changed recently @yoheiueda https://github.com/confidential-containers/cloud-api-adaptor/commit/2af7abc010a6e8305013e16fc8b7c6617b965d64
Looks like I introduced a bug in the Azure provider by #1018. Sorry for the inconvenience.
I raised PR #1151 to fix the problem.
@octaviansima The fix has been merged. Could you please try it again?
Deploy unencrypted images works, thanks! However running into the following error in the pod logs for an encrypted image (following the same workflow from the quickstart):
Failed to pull image "docker.io/osima/encrypt-testing:encrypted_busybox":
rpc error: code = Unknown desc = failed to pull and unpack image
"docker.io/osima/encrypt-testing:encrypted_busybox": failed to extract layer sha256:feb4513d4fb7052bcff38021fc9ef82fd409f4e016f3dff5c20ff5645cde4c02:
failed to get stream processor for application/vnd.oci.image.layer.v1.tar+gzip+encrypted: exec: "ctd-decoder":
executable file not found in $PATH: unknown
I was also running into an error related to being unable to fetch decryption keys, and now I'm trying to test that that was solved by setting AA_KBC
and KBC_URI
before creating the pod VM image with make image
. However, this new error seems to be higher up on the stack...
@octaviansima
I looked at the image w/ skopeo inspect docker://docker.io/osima/encrypt-testing:encrypted_busybox
. It looks like it's a different sha than the one you posted above. it seems to have correct coco annotations.
for an image to work it needs to be encrypted with coco_keyprovider
, which is accessed by ocicrypt-rs via GRPC api (here is a snippet from a workflow doing this). Similarly at image pull image-rs will call out to attestation-agent
via GRPC.
$ skopeo inspect docker://docker.io/osima/encrypt-testing:encrypted_busybox | jq '.LayersData[0].Annotations."org.opencontainers.image.enc.keys.provider.attestation-agent" | @base64d | fromjson'
{
"kid": "kbs:///default/image-kek/17d19f0e-6447-4a96-8e9e-cd22761934b5",
"wrapped_data": "yrBr+zk/iMbdZBK9v82wf61rANPvHANHBSQ3D3pJ0mtXk3lEbQzOAJ5wfXyDztlgHKFBl4wowWrX7VpS6xFG5J3zo2CR0noDB9E+cOm13JeouKaELbvB4xnwQf60gt6EUyfPdqI0rCnMbVON/aPpcuPZzZHIUv9xLhp69zmHswxCHtdWObOMzKuE77DnGasJvmezukSP1OTHlLZ8sMx+nFokF/8bxzCEVgCUL5dzQcb2s0Pgp1c1yg8YF4SB4A68tZ58pYvfdy6t9YomCuxQ4lU=",
"iv": "8g4fUKE7z/mF6YwB",
"wrap_type": "A256GCM"
}
@mkulke I believe I am encrypting with coco_keyprovider
-- my ocicrypt.conf is as follows:
{
"key-providers": {
"attestation-agent": {
"grpc": "127.0.0.1:50000"
}
}
}
and I do notice this output from the KBS cluster when I call skopeo copy ... --encryption-key provider:attestation-agent:cc_kbc::http://127.0.0.1:8080 ...
:
infra-kbs-1 | [2023-07-11T22:14:30Z INFO actix_web::middleware::logger] 172.19.0.2 "POST /kbs/v0/resource/default/image-kek/83a09951-16c9-4ea8-aee7-d56b9be149f6 HTTP/1.1" 200 0 "-" "-" 0.000599
infra-keyprovider-1 | [2023-07-11T22:14:30Z INFO coco_keyprovider::enc_mods] register KEK succeeded.
I wonder if it has anything to do with the root error in the logs exec: "ctd-decoder": executable file not found in $PATH: unknown
hmm, it looks it's falling back to some default decryption option. I think you do need to set AA_KBC
and KBS_URI
. The image annotation itself would not know which KBS to talk to (it has kbs:///default
set) and w/o setting the AA_KBC
params, attestation-agent might not start.
The respective kata-agent code is here
It's not trivial to test the image-decr functionality. the ttrpc/grpc endpoint in attestation-agent is an implementation of the ocicrypt plugin to decrypt layers, but you can check whether the attestation-agent process has been started w/ the correct configuration options.
So I'd recommend starting a peer-pod w/ an unencrypted image and shelling into the podvm. The attestation-agent process should be running on the podvm, also for unencrypted images.
When I create a VM from the generated image (I can't figure out how to SSH into a pod VM), attestation-agent process isn't running at all. ExecStart
found in /etc/systemd/system/attestation-agent.service
(/usr/local/bin/attestation-agent --getresource_sock 127.0.0.1:50001
), yields
/usr/local/bin/attestation-agent: error while loading shared libraries: libtdx_attest.so.1: cannot open shared object file: No such file or directory
and when I download those libraries:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: clean previous getresource socket file
Caused by:
socket address scheme is not expected', app/src/main.rs:29:39
I can't figure out how to SSH into a pod VM
In the caa configmap you specify a pub key for the podvm instances. what you can do is to start an unencrypted image (say "nginx:stable", so the pod starts up properly and the VM isn't killed by CAA). Since there is networking between the k8s node and the podvm, you should be able to use a node as intermediate jump host (using agent-forwarding or sth) to the podvm
libtdx_attest.so.1: cannot open shared object file: No such file or directory
This is a bug in the podvm image building process, the package should be installed, otherwise attestation-agent cannot run. Are you using a tagged release (v0.6)? If you add the required package to misc-settings.sh that might fix your issue
attestation-agent process isn't running at all.
that's expected. attestation-agent for image decryption is spawned manually by kata-agent when pulling images. the systemd unit is for secret release at runtime, which is currently WIP and needs manual tweaking to work: attestation-agent is compiled with ttrpc, which allows only domain socket communication.
You can still test it by providing a filename as a socket: /usr/local/bin/attestation-agent --help
Hi, I've been working on getting an instance of the CoCo operator up and running on AKS. Now running into an issue trying to run the sample application where the CCA has some gRPC issue. This is visible in the logs of the
cloud-api-adaptor-daemonset
pod:I confirmed that both the webhook and CoCo runtime are installed on the cluster just fine. Any idea what could be going on here?