Issue with Kubespray v2.14.0 (likely related to Multus configuration)

michaelspedersen commented 4 years ago

Link to issue with details: https://github.com/intel/multus-cni/issues/561

CNF Testbed (Kubespray) runs without errors, but inspecting the cluster afterwards show the following:

$ kubectl get all --all-namespaces 
NAMESPACE     NAME                                              READY   STATUS              RESTARTS   AGE 
kube-system   pod/calico-kube-controllers-b5c94f8f8-2j5fd       1/1     Running             0          43m
kube-system   pod/calico-node-b4kmp                             1/1     Running             0          45m
kube-system   pod/calico-node-cm4x4                             1/1     Running             0          45m
kube-system   pod/coredns-dff8fc7d-xps6q                        0/1     ContainerCreating   0          42m
kube-system   pod/dns-autoscaler-6fcd794dd8-88mb4               0/1     ContainerCreating   0          42m
kube-system   pod/kube-apiserver-node0                          1/1     Running             0          49m
kube-system   pod/kube-controller-manager-node0                 1/1     Running             0          49m
kube-system   pod/kube-multus-ds-amd64-b6dvr                    1/1     Running             0          44m
kube-system   pod/kube-multus-ds-amd64-pdpdn                    1/1     Running             0          44m
kube-system   pod/kube-proxy-57vbc                              1/1     Running             0          49m
kube-system   pod/kube-proxy-x89hc                              1/1     Running             0          46m 
kube-system   pod/kube-scheduler-node0                          1/1     Running             0          49m
kube-system   pod/kubernetes-dashboard-667c4c65f8-8474b         0/1     ContainerCreating   0          42m
kube-system   pod/kubernetes-metrics-scraper-54fbb4d595-df2ns   0/1     ContainerCreating   0          42m
kube-system   pod/nginx-proxy-node1                             1/1     Running             0          45m
kube-system   pod/nodelocaldns-859qq                            1/1     Running             0          42m
kube-system   pod/nodelocaldns-gmhdm                            1/1     Running             0          42m

Issue seems to be Kubespray deploying Multus with --cni-version=0.4.0, which should be supported, but it looks like it isn't.

Workaround:

With the cluster deployed, update the Multus daemonset:

$ kubectl edit ds kube-multus-ds-amd64 -n kube-system

Update the arguments to use --cni-version=0.3.1 instead of --cni-version=0.4.0 as shown below:

      - args:
        - --cni-conf-dir=/host/etc/cni/net.d
        - --cni-bin-dir=/host/opt/cni/bin
        - --multus-conf-file=auto
        - --multus-kubeconfig-file-host=/etc/cni/net.d/multus.d/multus.kubeconfig
        - --cni-version=0.3.1

Save the file and wait for the update to propagate. After that all of the above pods should end up in state "Running"

electrocucaracha commented 4 years ago

Hey @michaelspedersen, I did a quick test using a similar setup. This All-in-One cluster uses Kubespray v2.14.0 and Kubernetes v1.18.8.

    Command:
      /entrypoint.sh
    Args:
      --cni-conf-dir=/host/etc/cni/net.d
      --cni-bin-dir=/host/opt/cni/bin
      --multus-conf-file=auto
      --multus-kubeconfig-file-host=/etc/cni/net.d/multus.d/multus.kubeconfig
      --cni-version=0.4.0

This is the k8s-cluster.yml file

---
system_namespace: kube-system
kube_log_dir: "/var/log/kubernetes"
kube_api_anonymous_auth: true
kube_api_pwd: "secret"
kube_users:
  kube:
    pass: "{{ kube_api_pwd }}"
    role: admin
    groups:
      - system:masters
kube_basic_auth: false
kube_token_auth: false
kube_network_plugin: flannel
kubeconfig_localhost: true
kube_version: v1.18.8
kube_proxy_mode: iptables
download_run_once: true
local_release_dir: "/tmp/releases"
helm_enabled: false
local_volumes_enabled: true
local_volume_provisioner_enabled: true
download_localhost: true
kube_network_plugin_multus: true
kubectl_localhost: false
etcd_deployment_type: docker
kubelet_deployment_type: docker
container_manager: docker
kubelet_custom_flags:
  - "--cpu-manager-policy=static"  # Allows containers in Guaranteed pods with integer CPU requests access to exclusive CPUs on the node.
kubelet_flexvolumes_plugins_dir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
dashboard_skip_login: true
cert_manager_enabled: true
ingress_nginx_enabled: true

kubectl get pods -A 
NAMESPACE       NAME                                          READY   STATUS    RESTARTS   AGE
cert-manager    cert-manager-578cd6d964-wwk2f                 1/1     Running   0          71m
cert-manager    cert-manager-cainjector-5ffff9dd7c-26mwk      1/1     Running   0          71m
cert-manager    cert-manager-webhook-556b9d7dfd-g2nwd         1/1     Running   0          71m
ingress-nginx   ingress-nginx-controller-9gs5k                1/1     Running   0          71m
kube-system     coredns-dff8fc7d-qb928                        1/1     Running   0          70m
kube-system     coredns-dff8fc7d-s75cb                        0/1     Pending   0          70m
kube-system     dns-autoscaler-66498f5c5f-6rrkn               1/1     Running   0          70m
kube-system     kube-apiserver-aio                            1/1     Running   0          72m
kube-system     kube-controller-manager-aio                   1/1     Running   0          72m
kube-system     kube-flannel-mbqfs                            1/1     Running   0          71m
kube-system     kube-multus-ds-amd64-bmt99                    1/1     Running   0          71m
kube-system     kube-proxy-mnfdw                              1/1     Running   0          71m
kube-system     kube-scheduler-aio                            1/1     Running   0          72m
kube-system     kubernetes-dashboard-5697dbd455-7fb2c         1/1     Running   0          70m
kube-system     kubernetes-metrics-scraper-54fbb4d595-6jfk7   1/1     Running   0          70m
kube-system     local-volume-provisioner-p7cvs                1/1     Running   0          70m
kube-system     nodelocaldns-54s5c                            1/1     Running   0          70m

But my multus test worked

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: multus-deployment
  labels:
    app: multus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: multus
  template:
    metadata:
      labels:
        app: multus
      annotations:
        k8s.v1.cni.cncf.io/networks: '[
          { "name": "bridge-conf", "interfaceRequest": "eth1" },
          { "name": "bridge-conf", "interfaceRequest": "eth2" }
        ]'
    spec:
      containers:
      - name: multus-deployment
        image: "busybox"
        command: ["top"]
        stdin: true
        tty: true
EOF

$ kubectl exec -ti multus-deployment-858f78c94d-r5sdl -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if20: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue 
    link/ether 12:d5:47:0a:e2:e5 brd ff:ff:ff:ff:ff:ff
    inet 10.233.64.12/24 brd 10.233.64.255 scope global eth0
       valid_lft forever preferred_lft forever
5: eth1@if21: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue 
    link/ether 0a:ce:a5:59:2c:3f brd ff:ff:ff:ff:ff:ff
    inet 10.10.0.4/16 brd 10.10.255.255 scope global eth1
       valid_lft forever preferred_lft forever
7: eth2@if22: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue 
    link/ether 0e:f0:3c:88:aa:4d brd ff:ff:ff:ff:ff:ff
    inet 10.10.0.5/16 brd 10.10.255.255 scope global eth2
       valid_lft forever preferred_lft forever

michaelspedersen commented 4 years ago

Hmm, that's interesting @electrocucaracha (and thanks for also checking up on this). Did you check the status of coredns-dff8fc7d-s75cb. Looks like it is still pending in the above snapshot?

electrocucaracha commented 4 years ago

Yeah, I'm not sure why that happens but I have seen that behavior in All-in-One setups.

michaelspedersen commented 4 years ago

Workaround being applied through https://github.com/crosscloudci/k8s-infra/pull/20

cncf / cnf-testbed

Issue with Kubespray v2.14.0 (likely related to Multus configuration) #366