confidential-containers / guest-components

Confidential Containers Guest Tools and Components
Apache License 2.0
80 stars 91 forks source link

Failed to create pod with kata-qemu-coco-dev: failed to pull manifest error sending request for url. #749

Closed LiuSecone closed 3 days ago

LiuSecone commented 6 days ago

Describe the bug

I follow the quick start to deploy the guest-components and trustee, but I encountered issues running a workload with kata-qemu-coco-dev."

Here is my config file:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: encrypted-image-test-busybox
  name: encrypted-image-test-busybox
  annotations:
    io.containerd.cri.runtime-handler: kata-qemu-coco-dev
spec:
  containers:
  - image: secone/busybox:encrypted
    name: busybox
  dnsPolicy: ClusterFirst
  runtimeClassName: kata-qemu-coco-dev

And I got the error after applying the file:

Error: failed to create containerd task: failed to create shim task: failed to pull manifest error sending request for url (https://index.docker.io/v2/secone/busybox/manifests/encrypted)

When I change the image to busybox only, I got the same error. When I change the runtime handler and ClassName to kata-qemu and use the image busybox, the workload runs successfully."

Based on the previous test, I suspect I may have missed some configuration for kata-qemu-coco-dev, or there might be an issue with the encrypted image. Please help me, thanks.

What I expected: The pod status is running.

What I got:

$ kubectl get pods
NAME                           READY   STATUS             RESTARTS      AGE
encrypted-image-test-busybox   0/1     CrashLoopBackOff   5 (63s ago)   5m14s
unencrypted-image-test-busybox   1/1     Running            0              12s

Error: failed to create containerd task: failed to create shim task: failed to pull manifest error sending request for url (https://index.docker.io/v2/secone/busybox/manifests/encrypted)

How to reproduce

Use the single node k8s cluster, follow the quick start.

# operator
export RELEASE_VERSION=v0.10.0
kubectl apply -k "github.com/confidential-containers/operator/config/release?ref=${RELEASE_VERSION}"
kubectl apply -k "github.com/confidential-containers/operator/config/samples/ccruntime/default?ref=${RELEASE_VERSION}"
# trustee
git clone https://github.com/confidential-containers/trustee.git

cd trustee/kbs
openssl genpkey -algorithm ed25519 > ./config/private.key
openssl pkey -in kbs/config/private.key -pubout -out ./config/public.pub

sudo docker compose up -d
# encrypt the image
# edit ocicrypt.conf
tee > ocicrypt.conf <<EOF
{
    "key-providers": {
        "attestation-agent": {
            "grpc": "127.0.0.1:50000"
        }
    }
}
EOF

OCICRYPT_KEYPROVIDER_CONFIG=ocicrypt.conf skopeo copy --insecure-policy --encryption-key provider:attestation-agent docker://library/busybox oci:busybox:encrypted

skopeo copy oci:busybox_encrypted docker://docker.io/secone/busybox:encrypted
# run workload
kubectl apply -f ~/encrypted-image-test-busybox.yaml

CoCo version information

guest-components v0.10.0 trustee v0.10.0

What TEE are you seeing the problem on

None

Failing command and relevant log output

$ kubectl describe pod encrypted-image-test-busybox
Name:                encrypted-image-test-busybox
Namespace:           default
Priority:            0
Runtime Class Name:  kata-qemu-coco-dev
Service Account:     default
Node:                tdx0vm/10.16.23.144
Start Time:          Fri, 11 Oct 2024 11:46:55 +0000
Labels:              run=encrypted-image-test-busybox
Annotations:         io.containerd.cri.runtime-handler: kata-qemu-coco-dev
Status:              Running
IP:                  10.244.0.163
IPs:
  IP:  10.244.0.163
Containers:
  busybox:
    Container ID:  containerd://89723e942b7e37295ff47df31231b8f5886d6f4ff85fe66fe13d619447c985f8
    Image:         secone/busybox:encrypted
    Image ID:      docker.io/secone/busybox@sha256:b942fee5e3c9d3f4755ee034d81f6e2beb2d53c0b0446ee46f47168e80279419
    Port:          <none>
    Host Port:     <none>
    State:         Waiting
      Reason:      CrashLoopBackOff
    Last State:    Terminated
      Reason:      StartError
      Message:     failed to create containerd task: failed to create shim task: failed to pull manifest error sending request for url (https://index.docker.io/v2/secone/busybox/manifests/encrypted)

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>: unknown
      Exit Code:    128
      Started:      Thu, 01 Jan 1970 00:00:00 +0000
      Finished:     Fri, 11 Oct 2024 11:53:50 +0000
    Ready:          False
    Restart Count:  6
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rqmsb (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  kube-api-access-rqmsb:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              katacontainers.io/kata-runtime=true
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  7m56s                  default-scheduler  Successfully assigned default/encrypted-image-test-busybox to tdx0vm
  Normal   Pulling    7m53s                  kubelet            Pulling image "secone/busybox:encrypted"
  Normal   Pulled     7m48s                  kubelet            Successfully pulled image "secone/busybox:encrypted" in 5.249s (5.249s including waiting). Image size: 2157376 bytes.
  Normal   Created    5m38s (x5 over 7m48s)  kubelet            Created container busybox
  Normal   Pulled     5m38s (x4 over 7m37s)  kubelet            Container image "secone/busybox:encrypted" already present on machine
  Warning  Failed     5m28s (x5 over 7m37s)  kubelet            Error: failed to create containerd task: failed to create shim task: failed to pull manifest error sending request for url (https://index.docker.io/v2/secone/busybox/manifests/encrypted)

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>: unknown
  Warning  BackOff  2m51s (x20 over 7m26s)  kubelet  Back-off restarting failed container busybox in pod encrypted-image-test-busybox_default(32ab4841-f31b-4eeb-8c9b-3b305f37a7fa)
Xynnn007 commented 6 days ago

@LiuSecone Could you try to ensure the following

  1. The network environment is able to connect to dockerhub
  2. Delete the cached image on the host and then try again. Due to Container image "secone/busybox:encrypted" already present on machine
LiuSecone commented 5 days ago

@LiuSecone Could you try to ensure the following

  1. The network environment is able to connect to dockerhub
  2. Delete the cached image on the host and then try again. Due to Container image "secone/busybox:encrypted" already present on machine

Thanks for replying.

  1. I checked the network and is stable to connect to dockerhub.
  2. I deleted the image from the host, but the same error persisted. After running the workload, the encrypted image reappeared on the host. It seems like kata-qemu-coco-dev pulls the image twice, as indicated by the logs: Pulling image "secone/busybox:encrypted" and Successfully pulled image "secone/busybox:encrypted" in 5.363s.

Here is the command and output:

# tdxdemo @ tdx0vm in ~ [3:53:47] 
$ sudo ls /run/containerd/           
containerd.sock        io.containerd.grpc.v1.cri       io.containerd.runtime.v2.task  s
containerd.sock.ttrpc  io.containerd.runtime.v1.linux  runc

# tdxdemo @ tdx0vm in ~ [3:53:54] 
$ sudo crictl -r unix:///run/containerd/containerd.sock image ls | grep busybox
docker.io/secone/busybox                       encrypted                                  27a71e19c9562       2.16MB

# tdxdemo @ tdx0vm in ~ [3:54:23] 
$ sudo crictl -r unix:///run/containerd/containerd.sock rmi 27a71e19c9562      
Deleted: docker.io/secone/busybox:encrypted

# tdxdemo @ tdx0vm in ~ [3:54:34] 
$ sudo crictl -r unix:///run/containerd/containerd.sock image ls | grep busybox

# tdxdemo @ tdx0vm in ~ [3:54:36] C:1
$ cat ~/encrypted-image-test-busybox.yaml
apiVersion: v1
kind: Pod
metadata:
  labels:
    run: encrypted-image-test-busybox
  name: encrypted-image-test-busybox
  annotations:
    io.containerd.cri.runtime-handler: kata-qemu-coco-dev
spec:
  containers:
  - image: secone/busybox:encrypted
    name: busybox
  dnsPolicy: ClusterFirst
  runtimeClassName: kata-qemu-coco-dev

# tdxdemo @ tdx0vm in ~ [3:54:40] 
$ kubectl apply -f ~/encrypted-image-test-busybox.yaml                         
pod/encrypted-image-test-busybox created

# tdxdemo @ tdx0vm in ~ [3:54:54] 
$ kubectl get pods --watch                            
NAME                           READY   STATUS              RESTARTS   AGE
encrypted-image-test-busybox   0/1     ContainerCreating   0          16s
nginx                          1/1     Running             0          44h
encrypted-image-test-busybox   0/1     RunContainerError   0          19s
^C%                                                                                                                   

# tdxdemo @ tdx0vm in ~ [3:55:17] C:1
$ kubectl describe pod encrypted-image-test-busybox    
Name:                encrypted-image-test-busybox
Namespace:           default
Priority:            0
Runtime Class Name:  kata-qemu-coco-dev
Service Account:     default
Node:                tdx0vm/10.16.23.144
Start Time:          Sat, 12 Oct 2024 03:54:54 +0000
Labels:              run=encrypted-image-test-busybox
Annotations:         io.containerd.cri.runtime-handler: kata-qemu-coco-dev
Status:              Running
IP:                  10.244.0.177
IPs:
  IP:  10.244.0.177
Containers:
  busybox:
    Container ID:  containerd://3941fc8d31d58e2b9135daba24416297b7806801f1e7729c6f79287b6fbf310b
    Image:         secone/busybox:encrypted
    Image ID:      docker.io/secone/busybox@sha256:b942fee5e3c9d3f4755ee034d81f6e2beb2d53c0b0446ee46f47168e80279419
    Port:          <none>
    Host Port:     <none>
    State:         Waiting
      Reason:      RunContainerError
    Last State:    Terminated
      Reason:      StartError
      Message:     failed to create containerd task: failed to create shim task: failed to pull manifest error sending request for url (https://index.docker.io/v2/secone/busybox/manifests/encrypted)

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>: unknown
      Exit Code:    128
      Started:      Thu, 01 Jan 1970 00:00:00 +0000
      Finished:     Sat, 12 Oct 2024 03:55:13 +0000
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-llxpk (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  kube-api-access-llxpk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              katacontainers.io/kata-runtime=true
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  28s               default-scheduler  Successfully assigned default/encrypted-image-test-busybox to tdx0vm
  Normal   Pulling    25s               kubelet            Pulling image "secone/busybox:encrypted"
  Normal   Pulled     19s               kubelet            Successfully pulled image "secone/busybox:encrypted" in 5.363s (5.363s including waiting). Image size: 2157376 bytes.
  Normal   Created    9s (x2 over 19s)  kubelet            Created container busybox
  Warning  Failed     9s                kubelet            Error: failed to create containerd task: failed to create shim task: failed to pull manifest error sending request for url (https://index.docker.io/v2/secone/busybox/manifests/encrypted)

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>: unknown
  Normal  Pulled  9s  kubelet  Container image "secone/busybox:encrypted" already present on machine

# tdxdemo @ tdx0vm in ~ [3:55:22] 
$ sudo crictl -r unix:///run/containerd/containerd.sock image ls | grep busybox
docker.io/secone/busybox                       encrypted                                  27a71e19c9562       2.16MB
ChengyuZhu6 commented 5 days ago

I think you may need to configure the proxy in the guest . This issue might be caused by the network problem in the guest.

ChengyuZhu6 commented 5 days ago

Similar closed issue: https://github.com/confidential-containers/confidential-containers/issues/248

LiuSecone commented 5 days ago

Similar closed issue: confidential-containers/confidential-containers#248

Thank you so much! That was really helpful, and now the workload runs successfully. I also want to ask if it's possible to configure the parameters in kustomization.yaml?