confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
47 stars 79 forks source link

Attempt to update kata-runtime to point to the `main` version #1596

Closed stevenhorsman closed 4 months ago

stevenhorsman commented 10 months ago

As part of the merge to main effort, we have https://github.com/kata-containers/kata-containers/pull/7046 which is adding the remote hypervisor feature to the kata runtime. Once this is merge we should test out whether the CAA can re-vendor on it and see what issues there are. We also know that as part of these changes we want to remove the gogoprotobuf workaround https://github.com/confidential-containers/cloud-api-adaptor/blob/eb1b368f84825bb83b9033f07228e04cdef3ceb1/go.mod#L162-L164 and align with kata where we had to do the same thing.

stevenhorsman commented 10 months ago

After a lot of compilation issues, I've got it all compiling not and built a CAA OCI image from it, but when testing it doesn't work:

Events:
  Type     Reason                  Age                 From               Message
  ----     ------                  ----                ----               -------
  Normal   Scheduled               35m                 default-scheduler  Successfully assigned default/alpine to sh-libvirt-s390x-e2e-22-04-test-4
  Warning  FailedCreatePodSandBox  7s (x160 over 35m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: remote hypervisor call failed: ttrpc: closed: unknown

so I've clearly broken something during the change

stevenhorsman commented 10 months ago

Beraldo has recommended re-generating the hypervisor protos with ttrpc rather than grpc, so I've created a branch that has that change in kata-runtime: https://github.com/kata-containers/kata-containers/compare/main...stevenhorsman:hypervisor-ttrpc?expand=1 I'm not reworking the changes to undo some of these related to the ttrpc -> grpc change.

stevenhorsman commented 10 months ago

I've updated my branch to use my fork of kata runtime with the ttrpc changes and I think I get further now. as when I try and start a peer pods the error is:

Events:
  Type     Reason                  Age                    From               Message
  ----     ------                  ----                   ----               -------
  Normal   Scheduled               7m20s                  default-scheduler  Successfully assigned default/nginx-secret-pod to peer-pods-worker-0
  Warning  FailedCreatePodSandBox  114s (x26 over 7m20s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd container: create prepare snapshot dir: failed to create temp dir: stat /var/lib/containerd-nydus/snapshots: no such file or directory: unknown
stevenhorsman commented 10 months ago

We've been tracking the status on this in a slack thread. A rough summary is:

It also shows that nydus options aren't being set. This is for at least two reasons:

I think we are now blocked until these steps before we can go much further, but hopefully my kata runtime PR: https://github.com/kata-containers/kata-containers/pull/8520/commits can be merged in the mean time

stevenhorsman commented 10 months ago

We've managed to get some of the steps required for this upstreamed now:

Remaining issues that will need resolving before we can be unblocked here (and there might be more after)

stevenhorsman commented 6 months ago

Just an update on this - now the agent supports image pull on guest I've done a bunch of PoC work on this to push us further in the right direction. The current place we are at is that nydus_snapshotter isn't putting the correct annotation into the storage driver for us to pull on guest. Fabiano is also seeing this on local hypervisor, so hopefully we can work it out between us...

stevenhorsman commented 5 months ago

The problem we were hitting in noted in Issue 4 here: https://github.com/kata-containers/kata-containers/issues/8407#issuecomment-2049144827

I ran the following script, kindly provided by Fabiano on the worker:

    test_images_to_remove=(
        "docker.io/rancher/mirrored-pause"
        "registry.k8s.io/pause"
        "quay.io/sjenning/nginx"
        "quay.io/prometheus/busybox"
        "quay.io/confidential-containers/test-images"
    )

    ctr_args=""
    if [ "${KUBERNETES}" = "k3s" ]; then
        ctr_args="--address  /run/k3s/containerd/containerd.sock "
    fi
    ctr_args+="--namespace k8s.io"
    ctr_command="sudo -E ctr ${ctr_args}"
    for related_image in "${test_images_to_remove[@]}"; do
        # We need to delete related image
        image_list=($(${ctr_command} i ls -q |grep "$related_image" |awk '{print $1}'))
        if [ "${#image_list[@]}" -gt 0 ]; then
            for image in "${image_list[@]}"; do
                ${ctr_command} i remove "$image"
            done
        fi
        # We need to delete related content of image
        IFS="/" read -ra parts <<< "$related_image"; 
        repository="${parts[0]}";     
        image_name="${parts[1]}";
        formatted_image="${parts[0]}=${parts[-1]}"
        image_contents=($(${ctr_command} content ls | grep "${formatted_image}" | awk '{print $1}'))
        if [ "${#image_contents[@]}" -gt 0 ]; then
            for content in $image_contents; do
                ${ctr_command} content rm "$content"
            done
        fi
    done

and after that the image pull on host worked and the container is up and running:

2024/04/11 09:50:38 [adaptor/proxy]     storages:
2024/04/11 09:50:38 [adaptor/proxy]         mount_point:/run/kata-containers/7ce95eeeb93faef7640d7640c0d46ddd5128b4c81904478fc80ab945fedc4b56/rootfs source:docker.io/library/nginx:latest fstype:overlay driver:image_guest_pull
# kubectl get pods
NAME    READY   STATUS    RESTARTS   AGE
nginx   1/1     Running   0          76s

so we just need a way to do this more easily on the worker node...

stevenhorsman commented 5 months ago

Ok, I'm going to try and describe how to reliably and reproducible set up a dev environment for this for testing. I'm using libvirt with a kcli cluster and an pretty chunky 16 vCPU 32GRAM VM, but I'm not sure that is strictly required to be that large. I will also do some steps like pushing the podvm image that are just so people following that can use my images and save some time:

test_images_to_remove=( "docker.io/rancher/mirrored-pause" "registry.k8s.io/pause" "quay.io/sjenning/nginx" "quay.io/prometheus/busybox" "quay.io/confidential-containers/test-images" )

    ctr_args=""
    if [ "${KUBERNETES}" = "k3s" ]; then
            ctr_args="--address      /run/k3s/containerd/containerd.sock "
    fi
    ctr_args+="--namespace k8s.io"
    ctr_command="sudo -E ctr ${ctr_args}"
    for related_image in "${test_images_to_remove[@]}"; do
            # We need to delete related image
            image_list=($(${ctr_command} i ls -q |grep "$related_image" |awk '{print $1}'))
            if [ "${#image_list[@]}" -gt 0 ]; then
                    for image in "${image_list[@]}"; do
                            ${ctr_command} i remove "$image"
                    done
            fi
            # We need to delete related content of image
            IFS="/" read -ra parts <<< "$related_image";
            repository="${parts[0]}";
            image_name="${parts[1]}";
            formatted_image="${parts[0]}=${parts[-1]}"
            image_contents=($(${ctr_command} content ls | grep "${formatted_image}" | awk '{print $1}'))
            if [ "${#image_contents[@]}" -gt 0 ]; then
                    for content in $image_contents; do
                            ${ctr_command} content rm "$content"
                    done
            fi
    done
- Test it by creating a peer pod:

echo ' apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx name: nginx spec: replicas: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx annotations: io.containerd.cri.runtime-handler: kata-remote spec: runtimeClassName: kata-remote containers:

stevenhorsman commented 5 months ago

After chatting to Pradipta we've realised that we need to change and remove all the CAA install/kustomize's installation of the caa-pod now that the peerpodconfig-ctrl is deploying it. There is a lot of references to it, so I'm trying to go through and unpick and provide alternatives to this...

stevenhorsman commented 5 months ago

After a bunch of updates to resolve the double CAA ds the latest instructions are a bit more simplified:

I'm going to try and describe how to reliably and reproducible set up a dev environment for this for testing. I'm using libvirt with a kcli cluster and an pretty chunky 16 vCPU 32GRAM VM, but I'm not sure that is strictly required to be that large. I will also do some steps like pushing the podvm image that are just so people following that can use my images and save some time:

test_images_to_remove=( "docker.io/rancher/mirrored-pause" "registry.k8s.io/pause" "quay.io/sjenning/nginx" "quay.io/prometheus/busybox" "quay.io/confidential-containers/test-images" )

ctr_args="" if [ "${KUBERNETES}" = "k3s" ]; then ctr_args="--address /run/k3s/containerd/containerd.sock " fi ctr_args+="--namespace k8s.io" ctr_command="sudo -E ctr ${ctr_args}" for related_image in "${test_images_to_remove[@]}"; do

We need to delete related image

    image_list=($(${ctr_command} i ls -q |grep "$related_image" |awk '{print $1}'))
    if [ "${#image_list[@]}" -gt 0 ]; then
            for image in "${image_list[@]}"; do
                    ${ctr_command} i remove "$image"
            done
    fi
    # We need to delete related content of image
    IFS="/" read -ra parts <<< "$related_image";
    repository="${parts[0]}";
    image_name="${parts[1]}";
    formatted_image="${parts[0]}=${parts[-1]}"
    image_contents=($(${ctr_command} content ls | grep "${formatted_image}" | awk '{print $1}'))
    if [ "${#image_contents[@]}" -gt 0 ]; then
            for content in $image_contents; do
                    ${ctr_command} content rm "$content"
            done
    fi

done

exit

- Test it by creating a peer pod:

echo ' apiVersion: v1 kind: Pod metadata: name: nginx annotations: io.containerd.cri.runtime-handler: kata-remote labels: app: nginx spec: runtimeClassName: kata-remote containers:

beraldoleal commented 5 months ago

Hi @stevenhorsman I can confirm this is working with containerd cluster. I was able to reproduce it.

Unfortunately not true with cri-o. :(

$ kubectl logs pod/cc-operator-pre-install-daemon-nmmhs -n confidential-containers-system
INSTALL_COCO_CONTAINERD: false
INSTALL_OFFICIAL_CONTAINERD: false
INSTALL_VFIO_GPU_CONTAINERD: false
INSTALL_NYDUS_SNAPSHOTTER: true
ERROR: cri-o is not yet supported 

It looks like the operator is not fully supporting cri-o yet.

stevenhorsman commented 5 months ago

A small update here - I've created new images with the 3.4 release of kata-containers: quay.io/stevenhorsman/cloud-api-adaptor:dev-040e9bef8fdb4e2ed94cf68ae04e89076f7b3249 quay.io/stevenhorsman/podvm-generic-ubuntu-amd64:9def86b85afa60d06146608c1f27c74a0568534540f0cc2d60c08cb046cfa0db

stevenhorsman commented 5 months ago

This is going okay and I have some e2e passes locally, but can't get them on the e2e CI as the pull_request_target only pulls from the main branch, so we first have to try and code into the workflow that should work and I've spun up https://github.com/confidential-containers/cloud-api-adaptor/pull/1828 to do this.

I also can't fully test it on my fork as it runs on the azure runner that I can't access.

stevenhorsman commented 4 months ago

The workflow PR https://github.com/confidential-containers/cloud-api-adaptor/pull/1828 has been merged, so hopefully if I rebase the kata-runtime-bump branch on that we might be able to get some e2e tests runnings in the PR.