confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
48 stars 88 forks source link

'No route to host' when ping Pod created by kata-remote runtime on EKS #2140

Closed gaussye closed 1 week ago

gaussye commented 2 weeks ago

Describe the bug

I follow the guide https://github.com/confidential-containers/cloud-api-adaptor/blob/e14ad0fe0c3cc32c8f9634d9da353e35ebd423a0/src/cloud-api-adaptor/aws/README.md to setup the EKS environment for TEE. Now i have deployed 2 nginx deployment - one with run annotation runtimeClassName: kata-remote and the second one without annotation. I login to the node to ping the IP of both, the one with annotation show error message 'no route to host' and the other one works. Is it a normal behavior?

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx     
    spec:
      containers:
      - image: nginx@sha256:9700d098d545f9d2ee0660dfb155fe64f4447720a0a763a93f2cf08997227279
        name: nginx
---

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx-kata-remote
  name: nginx-kata-remote
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-kata-remote
  template:
    metadata:
      labels:
        app: nginx-kata-remote     
    spec:
      runtimeClassName: kata-remote
      containers:
      - image: nginx@sha256:9700d098d545f9d2ee0660dfb155fe64f4447720a0a763a93f2cf08997227279
        name: nginx-kata-remote
kubectl get pods  -o wide
NAME                                 READY   STATUS    RESTARTS   AGE    IP               NODE                          NOMINATED NODE   
nginx-6b84b47985-xn422               1/1     Running   0          14h    172.16.144.255   ip-10-0-1-231.ec2.internal    <none>           <none>
nginx-kata-remote-6c68c4d454-qn7fb   1/1     Running   0          40s    172.16.144.197   ip-10-0-1-231.ec2.internal    <none>           <none>
root@ip-10-0-1-231# ping 172.16.144.255
PING 172.16.144.255 (172.16.144.255) 56(84) bytes of data.
64 bytes from 172.16.144.255: icmp_seq=1 ttl=64 time=0.056 ms
64 bytes from 172.16.144.255: icmp_seq=2 ttl=64 time=0.049 ms
64 bytes from 172.16.144.255: icmp_seq=3 ttl=64 time=0.041 ms
64 bytes from 172.16.144.255: icmp_seq=4 ttl=64 time=0.049 ms
^C
--- 172.16.144.255 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3081ms
rtt min/avg/max/mdev = 0.041/0.048/0.056/0.005 ms
root@ip-10-0-1-231:/var/snap/amazon-ssm-agent/7993# ping 172.16.144.197
PING 172.16.144.197 (172.16.144.197) 56(84) bytes of data.
From 10.0.1.231 icmp_seq=1 Destination Host Unreachable
From 10.0.1.231 icmp_seq=2 Destination Host Unreachable
From 10.0.1.231 icmp_seq=3 Destination Host Unreachable
From 10.0.1.231 icmp_seq=4 Destination Host Unreachable
From 10.0.1.231 icmp_seq=5 Destination Host Unreachable
From 10.0.1.231 icmp_seq=6 Destination Host Unreachable

How to reproduce

Follow the guide https://github.com/confidential-containers/cloud-api-adaptor/blob/

CoCo version information

quay.io/confidential-containers/operator:v0.10.0

What TEE are you seeing the problem on

None

Failing command and relevant log output

No response

qzheng527 commented 2 weeks ago

@gaussye I also plan to try CoCo remote mode in AWS. May I know which AWS region and instance type you used? Thanks.

gaussye commented 2 weeks ago

@qzheng527 us-east-1

bpradipt commented 2 weeks ago

@gaussye I tried on my setup and couldn't recreate the issue

kubectl get pods -o wide                                               
NAME                                                               READY   STATUS      RESTARTS   AGE     IP               NODE                                           NOMINATED NODE   READINESS GATES
network-debug-app-584ddf7956-59kk8                                 1/1     Running     0          17m     172.16.63.12     ip-192-168-44-147.us-east-2.compute.internal   <none>           <none>
root@ip-192-168-44-147:/# ping 172.16.63.12
PING 172.16.63.12 (172.16.63.12) 56(84) bytes of data.
64 bytes from 172.16.63.12: icmp_seq=1 ttl=64 time=0.348 ms
64 bytes from 172.16.63.12: icmp_seq=2 ttl=64 time=0.265 ms
64 bytes from 172.16.63.12: icmp_seq=3 ttl=64 time=0.328 ms
64 bytes from 172.16.63.12: icmp_seq=4 ttl=64 time=0.265 ms

Did you set the VXLAN_PORT to 9000 in src/cloud-api-adaptor/install/overlays/aws/kustomization.yaml and enabled the VXLAN port in the security group ?

I realise that we don't explicitly mention this in this section - https://github.com/confidential-containers/cloud-api-adaptor/blob/main/src/cloud-api-adaptor/aws/README.md#deploy-caa

gaussye commented 1 week ago

@bpradipt After setting VXLAN_PORT=9000 in configmap it works now. Thanks.

bpradipt commented 1 week ago

Documentation updated via - https://github.com/confidential-containers/cloud-api-adaptor/pull/2148