abiosoft / colima

Container runtimes on macOS (and Linux) with minimal setup
MIT License
18.33k stars 372 forks source link

Kubernetes CNI chained plugin configuration ignored on v0.4.x #385

Open Cerebus opened 2 years ago

Cerebus commented 2 years ago

Installing a chained CNI plugin seems to be ignored on v0.4.x. The CNI config gets set properly in /var/lib/rancher/k3s/agent/etc/cni/net.d/ and the plugin executable gets installed properly in /var/lib/rancher/k3s/data/current/bin/, but it's never called. No CNI-related logs are emitted, and Pods come up only with the default flannel configuration.

The CNI in question works like multus; it delegates to the default plugin (flannel) first and does other things later in the chain without interfering with it.

This works in v0.3.x. Using docker driver in both cases.

I'm going to guess that this has something to do with the addition of embedded networking with v0.4.0. I'm open to workarounds; colima is a smoother experience in my environment that minikube, but I do a lot of CNI-related things so it would be nice to have it working again.

abiosoft commented 2 years ago

Yeah. I believe this is mainly due to use of https://github.com/Mirantis/cri-dockerd to cater for the deprecation of the docker support in k3s.

Do you mind providing steps to simulate your scenario? I will give it a go and see if there is a possible fix.

Cerebus commented 2 years ago

Unfortunately I don't have a public repo I can point you at, or anything remotely ready-to-run.

The bandwidth plugin should trigger this behavior, I think, but will require more manual setup.

A multus or meshnet should have this issue, but both will have to be tweaked. E.g., meshnet's daemonset has to point to the correct volumes[].hostPath to find the CNI directories (the ds will install the config chain and the meshnet plugin binary):

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: meshnet
  labels:
    k8s-app: meshnet
spec:
  selector:
    matchLabels:
      name: meshnet
  template:
    metadata:
      labels:
        name: meshnet
    spec:
      hostNetwork: true
      hostPID: true
      hostIPC: true
      serviceAccountName: meshnet
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
        - operator: Exists
          effect: NoSchedule
      containers:
        - name: meshnet
          securityContext:
            privileged: true
          image: networkop/meshnet:latest
          imagePullPolicy: IfNotPresent
          resources:
            limits:
              memory: 200Mi
            requests:
              cpu: 100m
              memory: 200Mi
          env:
            - name: HOST_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
          volumeMounts:
            - name: cni-cfg
              mountPath: /etc/cni/net.d
            - name: cni-bin
              mountPath: /opt/cni/bin
            - name: var-run-netns
              mountPath: /var/run/netns
              mountPropagation: Bidirectional
      terminationGracePeriodSeconds: 30
      volumes:
        - name: cni-bin
          hostPath:
            path: /var/lib/rancher/k3s/data/current/bin
        - name: cni-cfg
          hostPath:
            path: /var/lib/rancher/k3s/agent/etc/cni/net.d
        - name: var-run-netns
          hostPath:
            path: /var/run/netns

But this will naturally need all the rest of the meshnet deployment (crds, ns, sa, clusterrole, etc.), plus the demonstration (i.e. a couple of nodes and a Topology to connect them).

abiosoft commented 1 year ago

@Cerebus changing the path should work.

volumes:
  - name: cni-bin
    hostPath:
      path: /usr/libexec/cni
  - name: cni-cfg
    hostPath:
      path: /etc/cni/net.d
Cerebus commented 1 year ago

Those paths

@Cerebus changing the path should work.

volumes:
  - name: cni-bin
    hostPath:
      path: /usr/libexec/cni
  - name: cni-cfg
    hostPath:
      path: /etc/cni/net.d

Nope. /etc/cni/net.d doesn't exist in a k3s deployment; it's in /var/lib/rancher. Second, the stuff in libexec is ignored by k3s; it installs its own binaries in /var/lib/rancher as above.

ETA: with the docker runtime. Works with the containerd runtime, but I need dockerd as well.

abiosoft commented 1 year ago

ETA: with the docker runtime. Works with the containerd runtime, but I need dockerd as well.

This is mainly what I'm trying to confirm.

The cni setup is ignored for the docker runtime, that's most likely the cause.

Are you available to assist with testing? I can push out a quick fix for this.

Cerebus commented 1 year ago

In a slow loop, yes. :)