confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
48 stars 85 forks source link

Release v0.8.0 #1548

Closed wainersm closed 11 months ago

wainersm commented 1 year ago

Issue tracker for the CAA release as part of CoCo v0.8.0

v0.8.0-alpha.1

wainersm commented 12 months ago

Kata Containers for CoCo 0.8.0 RC1 - commit 424de1cbfa4e1da9ecf9a56b1d1e1a11a4f339cd

wainersm commented 12 months ago

guest-components for CoCo 0.8.0 RC1 - commit 615a46ff16ee8670014946d14e44e85cede82f01 (same commit pinned on kata's 424de1cbfa4e1da9ecf9a56b1d1e1a11a4f339cd)

stevenhorsman commented 12 months ago

I've raised PR https://github.com/confidential-containers/cloud-api-adaptor/pull/1559

To cover this sections:

Update the csi-wrapper and peerpod-ctrl go modules to use the tagged version of cloud-api-adapter, by running:

go get github.com/confidential-containers/cloud-api-adaptor@v<version>-alpha.1
go mod tidy

in their directories and removing the local replace references if we needed to add them earlier.

of the release doc

stevenhorsman commented 12 months ago

I've created pre-release https://github.com/confidential-containers/cloud-api-adaptor/releases/tag/v0.8.0-alpha.1 now, which should trigger the podvm build process to test that.

stevenhorsman commented 12 months ago

As reported on Slack, I've manually tested the RC podvm and CAA code for IBM Cloud with amd64 and s390x clusters/peer pods

mkulke commented 12 months ago

I tested azure podvm & caa images built on the v0.8.0-alpha.1 rev. looks good, also remote attestation seems to work.

stevenhorsman commented 12 months ago

I've run the libvirt e2e tests too:

time="2023-11-06T08:28:06-08:00" level=info msg="Install the cloud-api-adaptor"
Wait for the cc-operator-daemon-install DaemonSet be available
Wait for the pod cc-operator-daemon-install-bbzcq be ready
Wait for the cloud-api-adaptor-daemonset DaemonSet be available
Wait for the pod cloud-api-adaptor-daemonset-dgqt9 be ready
Wait for the kata-remote runtimeclass be created
=== RUN   TestLibvirtCreateSimplePod
=== RUN   TestLibvirtCreateSimplePod/SimplePeerPod_test
    assessment_runner_test.go:202: Expected Pod State: Running
    assessment_runner_test.go:203: Current Pod State: Running
=== RUN   TestLibvirtCreateSimplePod/SimplePeerPod_test/PodVM_is_created
    assessment_helpers_test.go:159: Pulled with nydus-snapshotter driver:2023/11/06 16:35:13 [adaptor/proxy]         mount_point:/run/kata-containers/77b955c79d3b1d8338e08a03b2257f75da8f87d59f064d518eb629135b19eebf/rootfs source:docker.io/library/nginx:latest fstype:overlay driver:image_guest_pull
time="2023-11-06T08:35:36-08:00" level=info msg="Deleting pod nginx..."
time="2023-11-06T08:35:51-08:00" level=info msg="Pod nginx has been successfully deleted within 60s"
--- PASS: TestLibvirtCreateSimplePod (162.18s)
    --- PASS: TestLibvirtCreateSimplePod/SimplePeerPod_test (162.16s)
        --- PASS: TestLibvirtCreateSimplePod/SimplePeerPod_test/PodVM_is_created (1.52s)
=== RUN   TestLibvirtCreatePodWithConfigMap
=== RUN   TestLibvirtCreatePodWithConfigMap/ConfigMapPeerPod_test
    assessment_runner_test.go:202: Expected Pod State: Running
    assessment_runner_test.go:203: Current Pod State: Running
=== RUN   TestLibvirtCreatePodWithConfigMap/ConfigMapPeerPod_test/Configmap_is_created_and_contains_data
time="2023-11-06T08:37:22-08:00" level=info msg="Data Inside Configmap: Hello, world"
time="2023-11-06T08:37:22-08:00" level=info msg="Deleting Configmap... nginx-configmap"
time="2023-11-06T08:37:22-08:00" level=info msg="Deleting pod nginx-configmap-pod..."
time="2023-11-06T08:37:32-08:00" level=info msg="Pod nginx-configmap-pod has been successfully deleted within 60s"
--- PASS: TestLibvirtCreatePodWithConfigMap (101.40s)
    --- PASS: TestLibvirtCreatePodWithConfigMap/ConfigMapPeerPod_test (101.39s)
        --- PASS: TestLibvirtCreatePodWithConfigMap/ConfigMapPeerPod_test/Configmap_is_created_and_contains_data (5.71s)
=== RUN   TestLibvirtCreatePodWithSecret
=== RUN   TestLibvirtCreatePodWithSecret/SecretPeerPod_test
    assessment_runner_test.go:202: Expected Pod State: Running
    assessment_runner_test.go:203: Current Pod State: Running
=== RUN   TestLibvirtCreatePodWithSecret/SecretPeerPod_test/Secret_has_been_created_and_contains_data
time="2023-11-06T08:38:58-08:00" level=info msg="Username from secret inside pod: admin"
time="2023-11-06T08:39:03-08:00" level=info msg="Password from secret inside pod: password"
time="2023-11-06T08:39:03-08:00" level=info msg="Deleting Secret... nginx-secret"
time="2023-11-06T08:39:03-08:00" level=info msg="Deleting pod nginx-secret-pod..."
time="2023-11-06T08:39:13-08:00" level=info msg="Pod nginx-secret-pod has been successfully deleted within 60s"
--- PASS: TestLibvirtCreatePodWithSecret (101.04s)
    --- PASS: TestLibvirtCreatePodWithSecret/SecretPeerPod_test (101.03s)
        --- PASS: TestLibvirtCreatePodWithSecret/SecretPeerPod_test/Secret_has_been_created_and_contains_data (10.60s)
=== RUN   TestLibvirtCreatePeerPodContainerWithExternalIPAccess
=== RUN   TestLibvirtCreatePeerPodContainerWithExternalIPAccess/IPAccessPeerPod_test
    assessment_runner_test.go:202: Expected Pod State: Running
    assessment_runner_test.go:203: Current Pod State: Running
=== RUN   TestLibvirtCreatePeerPodContainerWithExternalIPAccess/IPAccessPeerPod_test/Peer_Pod_Container_Connected_to_External_IP
time="2023-11-06T08:40:29-08:00" level=info msg="Output of ping command in busybox : PING www.google.com (142.251.36.36): 56 data bytes\n64 bytes from 142.251.36.36: seq=0 ttl=48 time=15.478 ms\n\n--- www.google.com ping statistics ---\n1 packets transmitted, 1 packets received, 0% packet loss\nround-trip min/avg/max = 15.478/15.478/15.478 ms\n"
time="2023-11-06T08:40:29-08:00" level=info msg="Deleting pod busybox-pod..."
time="2023-11-06T08:40:34-08:00" level=info msg="Pod busybox-pod has been successfully deleted within 60s"
--- PASS: TestLibvirtCreatePeerPodContainerWithExternalIPAccess (80.55s)
    --- PASS: TestLibvirtCreatePeerPodContainerWithExternalIPAccess/IPAccessPeerPod_test (80.54s)
        --- PASS: TestLibvirtCreatePeerPodContainerWithExternalIPAccess/IPAccessPeerPod_test/Peer_Pod_Container_Connected_to_External_IP (5.38s)
=== RUN   TestLibvirtCreatePeerPodWithJob
=== RUN   TestLibvirtCreatePeerPodWithJob/JobPeerPod_test
=== RUN   TestLibvirtCreatePeerPodWithJob/JobPeerPod_test/Job_has_been_created
    assessment_helpers_test.go:239: WARNING: job-pi-78ph2 - StartError
    assessment_helpers_test.go:264: SUCCESS: job-pi-hg9hs - Completed - LOG: 3.14156
    assessment_runner_test.go:226: Expected Completed status on first attempt
time="2023-11-06T08:42:49-08:00" level=info msg="Deleting Job... job-pi"
time="2023-11-06T08:42:49-08:00" level=info msg="Deleting pods created by job... job-pi-78ph2"
time="2023-11-06T08:42:49-08:00" level=info msg="Deleting pods created by job... job-pi-hg9hs"
--- PASS: TestLibvirtCreatePeerPodWithJob (135.49s)
    --- PASS: TestLibvirtCreatePeerPodWithJob/JobPeerPod_test (135.49s)
        --- SKIP: TestLibvirtCreatePeerPodWithJob/JobPeerPod_test/Job_has_been_created (0.15s)
=== RUN   TestLibvirtCreatePeerPodAndCheckUserLogs
    common_suite_test.go:154: Skipping Test until issue kata-containers/kata-containers#5732 is Fixed
--- SKIP: TestLibvirtCreatePeerPodAndCheckUserLogs (0.00s)
=== RUN   TestLibvirtCreatePeerPodAndCheckWorkDirLogs
=== RUN   TestLibvirtCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test
=== RUN   TestLibvirtCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test/Peer_pod_with_work_directory_has_been_created
    assessment_runner_test.go:260: Log output of peer pod:/other
time="2023-11-06T08:44:04-08:00" level=info msg="Deleting pod workdirpod..."
time="2023-11-06T08:44:09-08:00" level=info msg="Pod workdirpod has been successfully deleted within 60s"
--- PASS: TestLibvirtCreatePeerPodAndCheckWorkDirLogs (80.17s)
    --- PASS: TestLibvirtCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test (80.17s)
        --- PASS: TestLibvirtCreatePeerPodAndCheckWorkDirLogs/WorkDirPeerPod_test/Peer_pod_with_work_directory_has_been_created (5.04s)
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test/Peer_pod_with_environmental_variables_has_been_created
    assessment_runner_test.go:260: Log output of peer pod:KUBERNETES_SERVICE_PORT=443
        KUBERNETES_PORT=tcp://10.96.0.1:443
        HOSTNAME=env-variable-in-image
        SHLVL=1
        HOME=/root
        KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        KUBERNETES_PORT_443_TCP_PORT=443
        KUBERNETES_PORT_443_TCP_PROTO=tcp
        KUBERNETES_SERVICE_PORT_HTTPS=443
        KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
        ISPRODUCTION=false
        KUBERNETES_SERVICE_HOST=10.96.0.1
        PWD=/
time="2023-11-06T08:45:24-08:00" level=info msg="Deleting pod env-variable-in-image..."
time="2023-11-06T08:45:29-08:00" level=info msg="Pod env-variable-in-image has been successfully deleted within 60s"
--- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly (80.16s)
    --- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test (80.16s)
        --- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageOnly/EnvVariablePeerPodWithImageOnly_test/Peer_pod_with_environmental_variables_has_been_created (5.03s)
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test/Peer_pod_with_environmental_variables_has_been_created
    assessment_runner_test.go:260: Log output of peer pod:KUBERNETES_SERVICE_PORT=443
        KUBERNETES_PORT=tcp://10.96.0.1:443
        HOSTNAME=env-variable-in-config
        HOME=/root
        PKG_RELEASE=1~bookworm
        NGINX_VERSION=1.25.3
        KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        NJS_VERSION=0.8.2
        KUBERNETES_PORT_443_TCP_PORT=443
        KUBERNETES_PORT_443_TCP_PROTO=tcp
        KUBERNETES_SERVICE_PORT_HTTPS=443
        KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
        ISPRODUCTION=true
        KUBERNETES_SERVICE_HOST=10.96.0.1
        PWD=/
time="2023-11-06T08:46:55-08:00" level=info msg="Deleting pod env-variable-in-config..."
time="2023-11-06T08:47:00-08:00" level=info msg="Pod env-variable-in-config has been successfully deleted within 60s"
--- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly (90.34s)
    --- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test (90.34s)
        --- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithDeploymentOnly/EnvVariablePeerPodWithDeploymentOnly_test/Peer_pod_with_environmental_variables_has_been_created (5.17s)
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test
=== RUN   TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test/Peer_pod_with_environmental_variables_has_been_created
    assessment_runner_test.go:260: Log output of peer pod:KUBERNETES_PORT=tcp://10.96.0.1:443
        KUBERNETES_SERVICE_PORT=443
        HOSTNAME=env-variable-in-both
        SHLVL=1
        HOME=/root
        KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
        PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
        KUBERNETES_PORT_443_TCP_PORT=443
        KUBERNETES_PORT_443_TCP_PROTO=tcp
        KUBERNETES_SERVICE_PORT_HTTPS=443
        KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
        ISPRODUCTION=true
        KUBERNETES_SERVICE_HOST=10.96.0.1
        PWD=/
time="2023-11-06T08:48:15-08:00" level=info msg="Deleting pod env-variable-in-both..."
time="2023-11-06T08:48:20-08:00" level=info msg="Pod env-variable-in-both has been successfully deleted within 60s"
--- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment (80.25s)
    --- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test (80.25s)
        --- PASS: TestLibvirtCreatePeerPodAndCheckEnvVariableLogsWithImageAndDeployment/EnvVariablePeerPodWithBoth_test/Peer_pod_with_environmental_variables_has_been_created (5.09s)
PASS
ok      github.com/confidential-containers/cloud-api-adaptor/test/e2e   1879.673s

@wainersm @bpradipt - do you know of any other tests we need to do before we can say that peer pods 0.8.0 alpha.1 testing is completed

mkulke commented 12 months ago

hmm, when running the e2e test suite, other than in my manual test, i'm seeing snapshotter errors:

Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull
  and unpack image "docker.io/library/nginx:latest": failed to prepare
  extraction snapshot "extract-218425411-EbOv
  sha256:ec983b16636050e69677eb81537e955ab927757c23aaf73971ecf5f71fcc262a":
  missing CRI reference annotation for snaposhot 8: unknown

Note the typo "snaposhot", that's from nydus apparently

wainersm commented 11 months ago

@stevenhorsman managing to run the alpha.1 release with AWS

wainersm commented 11 months ago

@stevenhorsman managing to run the alpha.1 release with AWS

The simple pod failed on AWS (AKS) and remaining tests didn't run because it hit the timeout:

=== RUN   TestAwsCreateSimplePod
=== RUN   TestAwsCreateSimplePod/SimplePeerPod_test
    assessment_runner_test.go:190: timed out waiting for the condition
--- FAIL: TestAwsCreateSimplePod (900.53s)
    --- FAIL: TestAwsCreateSimplePod/SimplePeerPod_test (900.53s)

I will get more information.

FYI @bpradipt

stevenhorsman commented 11 months ago

@stevenhorsman managing to run the alpha.1 release with AWS

The simple pod failed on AWS (AKS) and remaining tests didn't run because it hit the timeout:

=== RUN   TestAwsCreateSimplePod
=== RUN   TestAwsCreateSimplePod/SimplePeerPod_test
    assessment_runner_test.go:190: timed out waiting for the condition
--- FAIL: TestAwsCreateSimplePod (900.53s)
    --- FAIL: TestAwsCreateSimplePod/SimplePeerPod_test (900.53s)

@wainersm - Hey Wainer, does your cluster have multiple worker nodes by any chance. I've just realised that my nydus verification tests just runs on the first CAA ds it finds, so wouldn't be reliable on a mutli node cluster. I'm looking to fix it soon, but can back out the test failure in the short-term if it's causing issues?

wainersm commented 11 months ago

@stevenhorsman managing to run the alpha.1 release with AWS

The simple pod failed on AWS (AKS) and remaining tests didn't run because it hit the timeout:

=== RUN   TestAwsCreateSimplePod
=== RUN   TestAwsCreateSimplePod/SimplePeerPod_test
    assessment_runner_test.go:190: timed out waiting for the condition
--- FAIL: TestAwsCreateSimplePod (900.53s)
    --- FAIL: TestAwsCreateSimplePod/SimplePeerPod_test (900.53s)

@wainersm - Hey Wainer, does your cluster have multiple worker nodes by any chance. I've just realised that my nydus verification tests just runs on the first CAA ds it finds, so wouldn't be reliable on a mutli node cluster. I'm looking to fix it soon, but can back out the test failure in the short-term if it's causing issues?

Hi @stevenhorsman , it is a single-node cluster but good to know tests aren't reliable on a multi-node cluster!

The problem is actually the region the framework deployed the cluster does not support confidential VM, which is now the default since commit e4059a5223bf4a955391b7f87178af9b11809dc2 . I will disable CVM and see if it passes the simple tests.

stevenhorsman commented 11 months ago

it is a single-node cluster but good to know tests aren't reliable on a multi-node cluster!

I've created https://github.com/confidential-containers/cloud-api-adaptor/pull/1562 which I hope fixes the issue

wainersm commented 11 months ago

@bpradipt @stevenhorsman the status of my tests for AWS is:

My cluster was gone overnight and I am working to re-install it to obtain more information.

mkulke commented 11 months ago
  • Now the "simple pod" fails to start with a nydus related error:
  Warning  Failed     48m                    kubelet            Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to prepare extraction snapshot "extract-101183212-RW23 sha256:ec983b16636050e69677eb81537e955ab927757c23aaf73971ecf5f71fcc262a": missing CRI reference annotation for snaposhot 3: unknown
  Warning  Failed     48m                    kubelet            Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to prepare extraction snapshot "extract-967947483-KBeo sha256:ec983b16636050e69677eb81537e955ab927757c23aaf73971ecf5f71fcc262a": missing CRI reference annotation for snaposhot 4: unknown
  Warning  Failed     47m                    kubelet         

Oh, that's relieving to see 😅

but are we talking about AKS or EKS (on AWS)? AKS clusters should bundle containerd 1.7, I think?

wainersm commented 11 months ago
  • Now the "simple pod" fails to start with a nydus related error:
  Warning  Failed     48m                    kubelet            Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to prepare extraction snapshot "extract-101183212-RW23 sha256:ec983b16636050e69677eb81537e955ab927757c23aaf73971ecf5f71fcc262a": missing CRI reference annotation for snaposhot 3: unknown
  Warning  Failed     48m                    kubelet            Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to prepare extraction snapshot "extract-967947483-KBeo sha256:ec983b16636050e69677eb81537e955ab927757c23aaf73971ecf5f71fcc262a": missing CRI reference annotation for snaposhot 4: unknown
  Warning  Failed     47m                    kubelet         

Oh, that's relieving to see 😅

but are we talking about AKS or EKS (on AWS)? AKS clusters should bundle containerd 1.7, I think?

Sorry, I meant AWS EKS :)

wainersm commented 11 months ago
  • Now the "simple pod" fails to start with a nydus related error:
  Warning  Failed     48m                    kubelet            Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to prepare extraction snapshot "extract-101183212-RW23 sha256:ec983b16636050e69677eb81537e955ab927757c23aaf73971ecf5f71fcc262a": missing CRI reference annotation for snaposhot 3: unknown
  Warning  Failed     48m                    kubelet            Failed to pull image "nginx": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginx:latest": failed to prepare extraction snapshot "extract-967947483-KBeo sha256:ec983b16636050e69677eb81537e955ab927757c23aaf73971ecf5f71fcc262a": missing CRI reference annotation for snaposhot 4: unknown
  Warning  Failed     47m                    kubelet         

Oh, that's relieving to see 😅 but are we talking about AKS or EKS (on AWS)? AKS clusters should bundle containerd 1.7, I think?

Sorry, I meant AWS EKS :)

Yesterday I tried again on AWS EKS on two versions (1.26 and 1.28) of kubernetes; I got the same missing CRI reference annotation for snaposhot error on both cases.

Then I deployed a kubeadm cluster via kcli on AWS with CentOS Stream 8 workers. This time I wasn't even able to get the podvm properly runnings (it fails on the "initializing" checks). This is the first time I tried that installation method it might be that I made some mistake.

@mkulke any luck with AKS?

mkulke commented 11 months ago

@wainersm: not yet, I'm trying to establish a working baseline with CLOUD_PROVIDER=libvirt on a single-node cluster. kcli also gave me grief, so I aborted that approach and i'm trying to deploy libvirt on single-node cluster created with kubeadm. It will start VMs, but the podvm do not seem to come up properly.

Do you think it makes sense to reference the missing CRI reference annotation for snaposhot to the tracking issue for remote-snapshotter on CC, now that we have this somewhat confirmed?

wainersm commented 11 months ago

@wainersm: not yet, I'm trying to establish a working baseline with CLOUD_PROVIDER=libvirt on a single-node cluster. kcli also gave me grief, so I aborted that approach and i'm trying to deploy libvirt on single-node cluster created with kubeadm. It will start VMs, but the podvm do not seem to come up properly.

Do you think it makes sense to reference the missing CRI reference annotation for snaposhot to the tracking issue for remote-snapshotter on CC, now that we have this somewhat confirmed?

Good idea. let me add an entry on that for the peer pod issue.

wainersm commented 11 months ago

@wainersm: not yet, I'm trying to establish a working baseline with CLOUD_PROVIDER=libvirt on a single-node cluster. kcli also gave me grief, so I aborted that approach and i'm trying to deploy libvirt on single-node cluster created with kubeadm. It will start VMs, but the podvm do not seem to come up properly. Do you think it makes sense to reference the missing CRI reference annotation for snaposhot to the tracking issue for remote-snapshotter on CC, now that we have this somewhat confirmed?

Good idea. let me add an entry on that for the peer pod issue.

Done :)

mkulke commented 11 months ago

Marked AKS as passing, also manually tested image decryption with snapshotter pulling, works as intended.

stevenhorsman commented 11 months ago

I've created https://github.com/confidential-containers/cloud-api-adaptor/pull/1570 to bump the versions to the 0.8 release

stevenhorsman commented 11 months ago

After the release I re-tested everything from scratch locally. For IBM cloud on s390x every test passed. For libvirt on kcli on x86 they all passed except nydus pull:

=== RUN   TestLibvirtCreateSimplePodWithNydusAnnotation/SimplePeerPod_test/PodVM_is_created
    assessment_helpers_test.go:162: Called PullImage explicitly, not using nydus-snapshotter :2023/11/14 15:08:25 [adaptor/proxy] CreateContainer: calling PullImage for "docker.io/library/alpine:latest" before CreateContainer (cid: "697963694d8aed2171cd3ea83759e0893c9f752b63c0e4047ffba26fc0c8e97d")
    assessment_runner_test.go:370: Expected to pull with nydus, but that didn't happen
time="2023-11-14T07:08:34-08:00" level=info msg="Deleting pod alpine..."
time="2023-11-14T07:08:44-08:00" level=info msg="Pod alpine has been successfully deleted within 60s"
--- FAIL: TestLibvirtCreateSimplePodWithNydusAnnotation (117.37s)
    --- FAIL: TestLibvirtCreateSimplePodWithNydusAnnotation/SimplePeerPod_test (117.36s)
        --- FAIL: TestLibvirtCreateSimplePodWithNydusAnnotation/SimplePeerPod_test/PodVM_is_created (1.57s)

I've manually checked and we aren't using the nydus pull with libvirt, but given that we've called that experimental I'm not sure if that is a problem at this point?

wainersm commented 11 months ago

After the release I re-tested everything from scratch locally. For IBM cloud on s390x every test passed. For libvirt on kcli on x86 they all passed except nydus pull:

=== RUN   TestLibvirtCreateSimplePodWithNydusAnnotation/SimplePeerPod_test/PodVM_is_created
    assessment_helpers_test.go:162: Called PullImage explicitly, not using nydus-snapshotter :2023/11/14 15:08:25 [adaptor/proxy] CreateContainer: calling PullImage for "docker.io/library/alpine:latest" before CreateContainer (cid: "697963694d8aed2171cd3ea83759e0893c9f752b63c0e4047ffba26fc0c8e97d")
    assessment_runner_test.go:370: Expected to pull with nydus, but that didn't happen
time="2023-11-14T07:08:34-08:00" level=info msg="Deleting pod alpine..."
time="2023-11-14T07:08:44-08:00" level=info msg="Pod alpine has been successfully deleted within 60s"
--- FAIL: TestLibvirtCreateSimplePodWithNydusAnnotation (117.37s)
    --- FAIL: TestLibvirtCreateSimplePodWithNydusAnnotation/SimplePeerPod_test (117.36s)
        --- FAIL: TestLibvirtCreateSimplePodWithNydusAnnotation/SimplePeerPod_test/PodVM_is_created (1.57s)

I've manually checked and we aren't using the nydus pull with libvirt, but given that we've called that experimental I'm not sure if that is a problem at this point?

Hi Steve, currently kcli deploys Ubuntu 20.04 nodes with containerd 1.6. As we are not setting INSTALL_OFFICIAL_CONTAINERD=true it is not replaced with containerd 1.7, thus nydus-snapshotter doesn't work. Let me send a PR to update the scripts to use ubuntu 22.04... hopefully it will work out of box.

stevenhorsman commented 11 months ago

Let me send a PR to update the scripts to use ubuntu 22.04... hopefully it will work out of box.

@wainersm - FYI - I'm testing https://github.com/stevenhorsman/cloud-api-adaptor/tree/containerd-22.04-switch locally at the moment, but if you have already created this PR then let me know and I can switch to it

wainersm commented 11 months ago

Let me send a PR to update the scripts to use ubuntu 22.04... hopefully it will work out of box.

@wainersm - FYI - I'm testing https://github.com/stevenhorsman/cloud-api-adaptor/tree/containerd-22.04-switch locally at the moment, but if you have already created this PR then let me know and I can switch to it

I didn't create the PR because yesterday I couldn't setup the cluster with Ubuntu 22.04 due to this bug: https://github.com/karmab/kcli/issues/615 . I still don't know how to test the fix, whether kcli has nightly builds or not. Do you know? Can you create the cluster with the kcli version you have installed?

Ah, let's use your PR ;)

stevenhorsman commented 11 months ago

Can you create the cluster with the kcli version you have installed?

Hmm, so I can create the cluster and the test still failed, so I need to look into it more. Unfortunately I didn't disable teardown and I'm having trouble creating the kcli cluster now, so I've had to spin up a new environment

stevenhorsman commented 11 months ago

Can you create the cluster with the kcli version you have installed?

So to start with the ubuntu 22.04 cluster seems correct:

ubuntu@peer-pods-worker-0:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

but it still has containerd 1.6:

ubuntu@peer-pods-worker-0:~$ containerd --version
containerd containerd.io 1.6.24 61f9fd88f79f081d64d6fa3bb1a0dc71ec870523
stevenhorsman commented 11 months ago

I'm guessing that my libvirt testing on rc1 was before the operator change that removed containerd from always being installed and that's why it worked then

mkulke commented 11 months ago

I'm guessing that my libvirt testing on rc1 was before the operator change that removed containerd from always being installed and that's why it worked then

yup. containerd is 1.6 on ubuntu 22.04. We need to set that env.