CentaurusInfra / mizar

Mizar – Experimental, High Scale and High Performance Cloud Network https://mizar.readthedocs.io
https://mizar.readthedocs.io
GNU General Public License v2.0
112 stars 50 forks source link

make Mizar work for containerd #654

Open sonyafenge opened 2 years ago

sonyafenge commented 2 years ago

What would you like to be added: make Mizar work for containerd

Why is this needed: We are working on vm support from Arktos kube-up.sh and found one error from mizar after change runtime from docker to containerd. Please kindly check the error information below.

Error from Mizar log:

I0611 01:35:26.382326 108946 mizarcni.go:56] CNI_ADD: >>>> args: '&{ContainerID:937f13a880c0e5f7762ce680cf98748e265acbbaff98e22c189f6ecdf9a80cc4 Netns:/var/run/netns/cni-df8228d4-c2f2-93a9-be2a-7683705524f7 IfName:eth0 Args:K8S_POD_NAME=kube-dns-autoscaler-748b78969c-qclzx;K8S_POD_INFRA_CONTAINER_ID=937f13a880c0e5f7762ce680cf98748e265acbbaff98e22c189f6ecdf9a80cc4;K8S_POD_UID=50f0cb3a-104c-4ce8-b787-48015b3e1479;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system Path:/opt/cni/bin StdinData:[123 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 110 97 109 101 34 58 34 109 105 122 97 114 99 110 105 34 44 34 116 121 112 101 34 58 34 109 105 122 97 114 99 110 105 34 125]}' I0611 01:35:31.391695 108946 mizarcni.go:60] CNI_ADD: Tracelog: 'CNI_ADD: Args: '{"Command":"ADD","ContainerID":"937f13a880c0e5f7762ce680cf98748e265acbbaff98e22c189f6ecdf9a80cc4","NetNS":"/var/run/netns/cni-df8228d4-c2f2-93a9-be2a-7683705524f7","IfName":"eth0","CniPath":"/opt/cni/bin","K8sPodNamespace":"kube-system","K8sPodName":"kube-dns-autoscaler-748b78969c-qclzx","K8sPodTenant":"","CniVersion":"0.3.1","NetworkName":"mizarcni","Plugin":"mizarcni"}' ' E0611 01:35:31.391703 108946 mizarcni.go:63] CNI_ADD: Error: 'rpc error: code = Unknown desc = Exception calling application: ConsumeInterfaces: Interface not found for pod 'kube-dns-autoscaler-748b78969c-qclzx-kube-system-''

Error from kuebelet log:

Jun 11 02:00:38 demo-june2022-rp-1-minion-group-pz9h kubelet[67509]: E0611 02:00:38.409887 67509 kuberuntime_manager.go:1024] createPodSandbox for pod "kube-dns-autoscaler-748b78969c-qclzx_kube-system_system(50f0cb3a-104c-4ce8-b787-48015b3e1479)" failed: rpc error: code = Unknown desc = failed to setup network for sandbox "e35bc0a7500802285bd0b12d96a74746e5e2a024dad637a43a71517ee8f00585": plugin type="mizarcni" name="mizarcni" failed (add): netplugin failed but error parsing its diagnostic message "{\n \"dns\": {}\n}{\n \"code\": 999,\n \"msg\": \"rpc error: code = Unknown desc = Exception calling application: ConsumeInterfaces: Interface not found for pod 'kube-dns-autoscaler-748b78969c-qclzx-kube-system-'\"\n}": invalid character '{' after top-level value Jun 11 02:00:38 demo-june2022-rp-1-minion-group-pz9h kubelet[67509]: E0611 02:00:38.409895 67509 kuberuntime_manager.go:1024] createPodSandbox for pod "test-cbf96d54d-vjzgh_default_elephant(9e173b5b-c206-46da-ba4a-ac1f9f4b01f1)" failed: rpc error: code = Unknown desc = failed to setup network for sandbox "82050868a2c7839b5ffe65b4686a87895a6a5d5aabcc5d6eb10895196792a476": plugin type="mizarcni" name="mizarcni" failed (add): netplugin failed but error parsing its diagnostic message "{\n \"dns\": {}\n}{\n \"code\": 999,\n \"msg\": \"rpc error: code = Unknown desc = Exception calling application: ConsumeInterfaces: Interface not found for pod 'test-cbf96d54d-vjzgh-default-'\"\n}": invalid character '{' after top-level value mizarcni.log mizar-operator.log

sonyafenge commented 2 years ago

Repro steps:

  1. Clone https://github.com/CentaurusInfra/arktos.git to local
  2. update to "KUBE_CONTAINER_RUNTIME" to "conttainerd" in ./cluster/gce/config-default.sh
  3. make clean
  4. make quick-release
  5. run command to start arktos cluster
    export NUM_NODES=2 RUN_PREFIX=[your run prefix] SCALEOUT_CLUSTER=true SCALEOUT_TP_COUNT=1 SCALEOUT_RP_COUNT=1 NETWORK_PROVIDER=mizar
    export MASTER_DISK_SIZE=500GB MASTER_ROOT_DISK_SIZE=500GB KUBE_GCE_ZONE=us-west2-b MASTER_SIZE=n1-highmem-32 NODE_SIZE=n1-highmem-16 NODE_DISK_SIZE=500GB GOPATH=$HOME/go KUBE_GCE_ENABLE_IP_ALIASES=true KUBE_GCE_PRIVATE_CLUSTER=true CREATE_CUSTOM_NETWORK=true KUBE_GCE_INSTANCE_PREFIX=${RUN_PREFIX} KUBE_GCE_NETWORK=${RUN_PREFIX} ENABLE_KCM_LEADER_ELECT=false ENABLE_SCHEDULER_LEADER_ELECT=false KUBE_ENABLE_APISERVER_INSECURE_PORT=true KUBE_ENABLE_PROMETHEUS_DEBUG=true KUBE_ENABLE_PPROF_DEBUG=true TEST_CLUSTER_LOG_LEVEL=--v=2 HOLLOW_KUBELET_TEST_LOG_LEVEL=--v=2 GCE_REGION=us-west2-b
    ./cluster/kube-up.sh
  6. check mizar components are running
    sonyali@sonyaperf2:~/go/src/k8s.io/arktos$ ./cluster/kubectl.sh --kubeconfig=/home/sonyali/go/src/k8s.io/arktos/cluster/kubeconfig.tp-1 get podsNAME                              HASHKEY               READY   STATUS    RESTARTS   AGE
    mizar-daemon-5c668ccc99-2fl8f     43175805228038350     1/1     Running   0          136m
    mizar-daemon-krnrt                7671207193321063543   1/1     Running   0          136m
    mizar-daemon-w78ls                2369352083330089202   1/1     Running   0          136m
    mizar-daemon-xphkl                7517945021093708646   1/1     Running   0          136m
    mizar-operator-6c5d4b5c96-68snv   5371898372571081877   1/1     Running   0          136m
  7. create new tenant elephant and deploy any pods, then check pod status, get pods is "ContainerCreating" status
    sonyali@sonyaperf2:~/go/src/k8s.io/arktos$ ./cluster/kubectl.sh --kubeconfig=/home/sonyali/go/src/k8s.io/arktos/cluster/kubeconfig.tp-1 get pods --tenant elephant
    NAME                   HASHKEY               READY   STATUS              RESTARTS   AGE
    test-cbf96d54d-26hqv   8671255153654746800   0/1     ContainerCreating   0          119s
    test-cbf96d54d-lwbjg   3869940810048378113   0/1     ContainerCreating   0          119s
    test-cbf96d54d-nl989   5026845982308391078   0/1     ContainerCreating   0          119s