confidential-containers / operator

Operator to deploy confidential containers runtime
Apache License 2.0
107 stars 58 forks source link

install custom resource on k3s fails #258

Open kempy007 opened 1 year ago

kempy007 commented 1 year ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Following This Guide
  2. Step 'Create Custom Resource (CR)' fails, apply CRD and retry step (See errors below).
  3. POD cc-operator-pre-install-daemon enter crash loop.
  4. See errors below

    kubectl apply -k github.com/confidential-containers/operator/config/samples/ccruntime/default

    Error from server (BadRequest): error when creating "github.com/confidential-containers/operator/config/samples/ccruntime/default": CcRuntime in version "v1beta1" cannot be handled as a CcRuntime: strict decoding error: unknown field "spec.config.debug", unknown field "spec.config.defaultRuntimeClassName", unknown field "spec.config.runtimeClasses"

    kubectl apply -k github.com/confidential-containers/operator/config/crd

    customresourcedefinition.apiextensions.k8s.io/ccruntimes.confidentialcontainers.org configured

    kubectl apply -k github.com/confidential-containers/operator/config/samples/ccruntime/default

    ccruntime.confidentialcontainers.org/ccruntime-sample created

    POD LOG: cc-operator-pre-install-daemon-xxxxx

    Copying containerd-for-cc artifacts onto host
    nsenter: can't execute 'systemctl': No such file or directory

Describe the results you expected A clear and concise description of what you expected to happen.

Describe the results you received: A clear and concise description of what happened.

Additional context Add any other context about the problem here.

fidencio commented 1 year ago

We do not support anything else but "vanilla" kubernetes.

We may want to support k3s, rke2, and k0s, in the short term future, but right now this is not being considered as there are other fights to pick.

If you want to contribute to this feature, please, let me know, as I've done a lot on that area on Kata Containers, and I can point you to the right direction here.

kempy007 commented 1 year ago

I'm constrained to bare metal ubuntu!

What would you suggest to get a quick dev "vanilla" kubernetes that will be compatible with your project?

I would like to avoid kubespray, I was thinking to use microk8s or minikube or just kubeadm. I was considering rke2, but I see here that it is based on k3s.

Can you just clarify that the issue with these lightweight kubernetes is that they use a statically embedded containerd binary that cannot be changed once running which breaks your setup routine for the runtime hooks.

fidencio commented 1 year ago

https://github.com/confidential-containers/operator/blob/main/tests/e2e/ansible/install_kubeadm.yml, kubeadm is what we're using as part of our tests.

So, for everything we've done till v0.7.0, we're relying on a forked version of containerd, and replacing that would be a pain for no clear benefit with k3s / rke2 / k0s, as the code would just be thrown away at some point.

For v0.8.0, we're dropping the containerd dependency, but we're relying on an external snapshotter (nydus), which I still didn't think what's the best way to integrate with the pre-install payload.

My personal plan is to address this as part of v0.10.0, as v0.9.0 will be focused on getting rid of the "CCv0" branch used from Kata Containers.

Once this snapshotter dependency gets solved, then we can easily start relying (and also testing) against rke2, k3s, and k0s (as we already support those as part of Kata Containers).

@kempy007, does the answer make sense? I know the pain, and I'm sorry we didn't get to improving things on that front yet.

kempy007 commented 1 year ago

Thank you I understand and hopefully this will help anyone else who runs into this.

I will add your suggestions into https://github.com/Cypherpunk-Labs/akash-confidential-containers-operator under a demo folder, Thx.

One last thing and then this can be closed, can you guess which quarter v0.10.0 will land?

fidencio commented 12 months ago

can you guess which quarter v0.10.0 will land?

I'd expect 24Q1.

zvonkok commented 12 months ago

@kempy007 To run k3s with the current version of coco operator you need to tell k3s to use the external containerd runtime

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--container-runtime-endpoint /run/containerd/containerd.sock" sh -s -

I am running CoCo and GPU operator successfully with k3s, I haven't seen the issue with the CR

❯ k get pod -A                               
NAMESPACE                        NAME                                                              READY   STATUS      RESTARTS      AGE
kube-system                      helm-install-traefik-crd-9tdkp                                    0/1     Completed   0             6d6h
kube-system                      helm-install-traefik-zlpvn                                        0/1     Completed   1             6d6h
confidential-containers-system   cc-operator-daemon-install-884qt                                  1/1     Running     2 (31h ago)   6d5h
kube-system                      svclb-traefik-90ed600b-lmgz9                                      2/2     Running     4 (31h ago)   6d6h
kube-system                      coredns-59b4f5bbd5-jqt8b                                          1/1     Running     2 (31h ago)   6d6h
gpu-operator                     gpu-operator-5bcd65556-b6pv7                                      1/1     Running     2 (31h ago)   6d5h
kube-system                      traefik-64f55bb67d-t8rp5                                          1/1     Running     2 (31h ago)   6d6h
confidential-containers-system   cc-operator-pre-install-daemon-8d2db                              1/1     Running     4 (31h ago)   6d5h
kube-system                      metrics-server-648b5df564-68788                                   1/1     Running     4 (31h ago)   6d6h
gpu-operator                     gpu-operator-1694777422-node-feature-discovery-master-54d42ntr9   1/1     Running     4 (31h ago)   6d5h
confidential-containers-system   cc-operator-controller-manager-8647c9b577-vktbl                   2/2     Running     6 (31h ago)   6d6h
kube-system                      local-path-provisioner-76d776f6f9-xlz4j                           1/1     Running     5 (31h ago)   6d6h
gpu-operator                     gpu-operator-1694777422-node-feature-discovery-worker-lwj5b       1/1     Running     5 (31h ago)   6d5h
gpu-operator                     nvidia-vfio-manager-gqlcf                                         1/1     Running     2 (31h ago)   6d5h
gpu-operator                     nvidia-sandbox-validator-gmllr                                    1/1     Running     0             31h
gpu-operator                     nvidia-sandbox-device-plugin-daemonset-qnp4z                      1/1     Running     0             31h
gpu-operator                     nvidia-kata-manager-n2pt8                                         1/1     Running     2 (31h ago)   6d5h
Villain88 commented 9 months ago
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--container-runtime-endpoint /run/containerd/containerd.sock" sh -s -

Hello. I am experiencing similar problems when trying to deploy CoCo.

Initial data:

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--container-runtime-endpoint /run/containerd/containerd.sock" sh -s -

The Kubernates' pods are hanging in pending status.

If I use the built-in containerd, I have another problem. After executing the following commands:

kubectl label nodes hostname node-role.kubernetes.io/worker=
kubectl apply -k github.com/confidential-containers/operator/config/release?ref=v0.8.0
kubectl get pods -n confidential-containers-system --watch
kubectl apply -k github.com/confidential-containers/operator/config/samples/ccruntime/default?ref=v0.8.0

Only the container cc-operator-controller-manager is created Similar situation with rke2 and kind

What am I doing wrong and where to look for the problem?

I also tried Centos 7, but the problem is the same: only 1 confidential-containers-system container is created, instread of 3

Deploying k8s can be tricky for tests.

Please describe the limitations in more detail, e.g. which operating systems are supported