confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
48 stars 88 forks source link

test/e2e: track the libvirt tests failing with cri-o #2100

Open wainersm opened 1 month ago

wainersm commented 1 month ago

Describe the bug

An outcome of https://github.com/confidential-containers/cloud-api-adaptor/pull/2068 (issue #1981) is the following tests failing:

TestLibvirtCreatePeerPodContainerWithExternalIPAccess
TestLibvirtPodToServiceCommunication
TestLibvirtPodsMTLSCommunication

TestLibvirtPodToServiceCommunication and TestLibvirtPodsMTLSCommunication fail with the same symptom: cannot access a service by name from within the container. Both are client/server test case style.

The TestLibvirtCreatePeerPodContainerWithExternalIPAccess fail because from the container it's not able to resolve the www.google.com address.

Worth noting:

  1. @littlejawa was able to pass TestLibvirtPodToServiceCommunication on OCP 4.17 with OSC 1.17. It might indicate a problem on kubernetes setup on our e2e tests for libvirt, that leverages kcli
  2. TestLibvirtPodToServiceCommunication, TestLibvirtPodsMTLSCommunication and TestLibvirtCreatePeerPodContainerWithExternalIPAccess fail on CI (running on github actions) even with containerd; however they pass when running on dev's workstation. Unlikely, with CRI-O, it fails on both scenarios.

How to reproduce

N/A

CoCo version information

N/A

What TEE are you seeing the problem on

None

Failing command and relevant log output

No response

littlejawa commented 1 month ago

Adding some more information on TestLibvirtPodToServiceCommunication:

I was able to reproduce the issue using kcli to run a cluster with cri-o, and using runc as the runtime. As you mentioned, the test pass on OCP, and it passes on a K8S cluster using containerd as the engine. So this is something that kcli misses when setting up cri-o. I've raised the question with the maintainer of kcli, and with people on the cri-o side, to understand what's wrong in the cluster setup. I will continue digging.