abiosoft / colima

Container runtimes on macOS (and Linux) with minimal setup
MIT License
19.42k stars 391 forks source link

Istio CNI plugin not working on Colima 0.4.6 with docker runtime #448

Open cotej opened 2 years ago

cotej commented 2 years ago

Description

When using colima to run a Kubernetes cluster with Istio + its CNI plugin installed, pods which inject the Istio sidecar do not work.

I believe this may be related to https://github.com/abiosoft/colima/issues/385 or perhaps it is even the same issue. In any event, maybe this report will serve as a way to reproduce in a concise fashion.

Version

Colima Version: colima version v0.4.6 git commit: 10377f3a20c2b0f7196ad5944264b69f048a3d40

Lima Version: limactl version 0.11.3

Qemu Version: qemu-img version 7.1.0 Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers

Operating System

Reproduction Steps

  1. Start colima with the docker runtime and kubernetes enabled

    => colima start --runtime docker --kubernetes --kubernetes-version "v1.22.15+k3s1" --cpu 4 --memory 16 --disk 40
    INFO[0000] starting colima
    INFO[0000] runtime: docker+k3s
    INFO[0000] preparing network ...                         context=vm
    INFO[0000] creating and starting ...                     context=vm
    INFO[0030] provisioning ...                              context=docker
    INFO[0030] starting ...                                  context=docker
    INFO[0036] provisioning ...                              context=kubernetes
    INFO[0036] downloading and installing ...                context=kubernetes
    INFO[0046] loading oci images ...                        context=kubernetes
    INFO[0054] starting ...                                  context=kubernetes
    INFO[0058] updating config ...                           context=kubernetes
    INFO[0059] Switched to context "colima".                 context=kubernetes
    INFO[0059] done
  2. Install Istio into the k8s cluster, pointing to the appropriate CNI directories used by k3s

    
    => istioctl version
    no running Istio pods in "istio-system"
    1.13.7

=> istioctl install --set 'components.cni.enabled=true' --set 'values.cni.cniBinDir=/var/lib/rancher/k3s/data/current/bin' --set 'values.cni.cniConfDir=/var/lib/rancher/k3s/agent/etc/cni/net.d' This will install the Istio 1.13.7 default profile with ["Istio core" "Istiod" "CNI" "Ingress gateways"] components into the cluster. Proceed? (y/N) y ✔ Istio core installed ✔ Istiod installed ✔ Ingress gateways installed ✔ CNI installed ✔ Installation complete Making this installation the default for injection and validation.

Thank you for installing Istio 1.13. Please take a few minutes to tell us about your install/upgrade experience! https://forms.gle/pzWZpAvMVBecaQ9h9


Note here that the `istio-cni-node` pod seems as if it runs correctly here and the logs don't indicate any problems, however...

3. Label the default namespace and deploy an arbitrary pod for which Istio will inject its proxy sidecar.

=> kubectl label namespace default "istio-injection=enabled" namespace/default labeled

=> kubectl create deployment caddy-app --image caddy deployment.apps/caddy-app created


Due to the use of Istio's CNI plugin, the pod has an `istio-validation` initContainer, and it's this container that encounters the problem, which appears as:

=> kubectl logs -lapp=caddy-app -c istio-validation 2022-10-17T20:26:46.727897Z info in new validator: 172.17.0.9 2022-10-17T20:26:46.728051Z info Listening on 127.0.0.1:15001 2022-10-17T20:26:46.728439Z info Listening on 127.0.0.1:15006 2022-10-17T20:26:46.729492Z error Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused 2022-10-17T20:26:47.730360Z error Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused 2022-10-17T20:26:48.732943Z error Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused 2022-10-17T20:26:49.736389Z error Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused 2022-10-17T20:26:50.740989Z error Error connecting to 127.0.0.6:15002: dial tcp 127.0.0.1:0->127.0.0.6:15002: connect: connection refused 2022-10-17T20:26:51.728686Z error validation timeout



### Expected behaviour

The `istio-validation` initContainer should succeed and then allow the other containers in the pod to start.

### Additional context

I have also tried this same scenario using other (newer) k8s versions as well as the containerd runtime instead, but I encountered this issue in all cases.

The issue I linked above suggested that this was working on colima 0.3.x - I haven't tried that yet so I'll give it a shot and report back with my findings.
cotej commented 2 years ago

Just tested downgrading to colima 0.3.4 and confirmed that everything works using the same steps above and the istio-validation container succeeds.

abiosoft commented 2 years ago

Figured the issue, because the containerd being used is the default one (i.e. not the one bundled with k3s) the path needs to change accordingly.

This works for me.

istioctl install --set 'components.cni.enabled=true' --set 'values.cni.cniBinDir=/usr/libexec/cni/' --set 'values.cni.cniConfDir=/etc/cni/net.d'
cotej commented 2 years ago

Thanks @abiosoft, I appreciate the quick response.

When I try with your suggested paths, I find that the istio-cni-pod is not able to ready up successfully.

=> colima start --runtime docker --kubernetes --kubernetes-version "v1.22.15+k3s1" --cpu 4 --memory 16 --disk 40
INFO[0000] starting colima
INFO[0000] runtime: docker+k3s
INFO[0000] preparing network ...                         context=vm
INFO[0000] creating and starting ...                     context=vm
INFO[0031] provisioning ...                              context=docker
INFO[0031] starting ...                                  context=docker
INFO[0036] provisioning ...                              context=kubernetes
INFO[0036] downloading and installing ...                context=kubernetes
INFO[0046] loading oci images ...                        context=kubernetes
INFO[0054] starting ...                                  context=kubernetes
INFO[0059] updating config ...                           context=kubernetes
INFO[0059] Switched to context "colima".                 context=kubernetes
INFO[0060] done

On a fresh VM I can see that CNI bin path you've used does indeed exist, however the CNI conf dir does not.

=> colima ssh -- sudo ls -l /usr/libexec/cni/
total 45692
-rwxr-xr-x    1 root     root       2564600 Sep  8 08:24 bandwidth
-rwxr-xr-x    1 root     root       2869400 Sep  8 08:24 bridge
-rwxr-xr-x    1 root     root       7075320 Sep  8 08:24 dhcp
-rwxr-xr-x    1 root     root       2959160 Sep  8 08:24 firewall
lrwxrwxrwx    1 root     root            13 Oct 18 15:02 flannel -> flannel-amd64
-rwxr-xr-x    1 root     root       2114552 Sep  8 08:24 flannel-amd64
(...truncated output...)

=> colima ssh -- sudo ls -l /etc/cni/net.d/
ls: /etc/cni/net.d/: No such file or directory
FATA[0000] exit status 1

The lack of CNI conf is what causes a problem for Istio, since it wants to detect an existing CNI conf that it can use to add its plugin into the chain.

=> istioctl install \
  --set 'components.cni.enabled=true' \
  --set 'values.cni.cniBinDir=/usr/libexec/cni/' \
  --set 'values.cni.cniConfDir=/etc/cni/net.d'
This will install the Istio 1.13.7 default profile with ["Istio core" "Istiod" "CNI" "Ingress gateways"] components into the cluster. Proceed? (y/N) y
✔ Istio core installed
✔ Istiod installed
✔ Ingress gateways installed
✘ CNI encountered an error: failed to wait for resource: resources not ready after 5m0s: timed out waiting for the condition

- Pruning removed resources                                                                                                                                                                     Error: failed to install manifests: errors occurred during operation
=> kubectl get pod -n istio-system
NAME                                    READY   STATUS    RESTARTS   AGE
istiod-56ff668594-gf2xx                 1/1     Running   0          10m
svclb-istio-ingressgateway-7v957        3/3     Running   0          10m
istio-cni-node-v6rvp                    0/1     Running   0          10m
istio-ingressgateway-6859849654-fpcc7   1/1     Running   0          10m

=> kubectl logs -lk8s-app=istio-cni-node -n istio-system
  "log_level": "info",
  "log_uds_address": "/var/run/istio-cni/log.sock",
  "kubernetes": {
      "kubeconfig": "/etc/cni/net.d/ZZZ-istio-cni-kubeconfig",
      "cni_bin_dir": "/usr/libexec/cni/",
      "exclude_namespaces": [ "istio-system", "kube-system" ]
  }
}
2022-10-18T15:05:58.227368Z warn    install Istio CNI is configured as chained plugin, but cannot find existing CNI network config: no networks found in /host/etc/cni/net.d
2022-10-18T15:05:58.227458Z info    install Waiting for CNI network config file to be written in /host/etc/cni/net.d...
abiosoft commented 2 years ago

@cotej can you try with the containerd runtime?

Yeah, the directory is not getting created for Docker runtime.

cotej commented 2 years ago

Yeah, it seems to work correctly with containerd runtime. :+1: I'll try to explore if this is a valid workaround for my project.

Is it reasonable to keep this issue open to address the docker runtime?

cotej commented 2 years ago

Out of curiosity, is the path for CNI bin (/usr/libexec/cni) something that's controlled by colima? Just thinking that it may work a little nicer out of the box if it aligned with Istio's default (/opt/cni/bin). Not sure how reasonable that is, but just a thought.

abiosoft commented 2 years ago

Yeah, it seems to work correctly with containerd runtime. 👍 I'll try to explore if this is a valid workaround for my project.

The directory can actually be created for Docker runtime as well, there was an assumption that it's irrelevant for Docker and therefore ignored. I will try it and if it fixes it, will push a fix.

Is it reasonable to keep this issue open to address the docker runtime?

Yes

Out of curiosity, is the path for CNI bin (/usr/libexec/cni) something that's controlled by colima? Just thinking that it may work a little nicer out of the box if it aligned with Istio's default (/opt/cni/bin). Not sure how reasonable that is, but just a thought.

Yeah, partly. It would need to be changed in the underlying iso image. I however think it's not needed and a symlink should suffice.