cybozu-go / coil

CNI plugin for Kubernetes designed for scalability and extensibility
Apache License 2.0
158 stars 18 forks source link

CNI issue in kind-created cluster #264

Closed linchuan4028 closed 6 months ago

linchuan4028 commented 7 months ago

I was setup the coil natgateway on the kind created cluster according to the READ.ME command.

$ cd v2
$ make certs
$ make image

$ cd e2e
$ make start
$ make install-coil

but I encountered some CNI issues. There are some pending pods and core DNS was holding on the ContainerCreating.

NAMESPACE            NAME                                         READY   STATUS              RESTARTS   AGE   IP           NODE                 NOMINATED NODE   READINESS GATES
kube-system          coil-controller-866b8fd666-5b5l4             0/1     Pending             0          14m   <none>       <none>               <none>           <none>
kube-system          coil-controller-866b8fd666-f5zmw             1/1     Running             0          14m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          coil-router-6w6sp                            1/1     Running             0          14m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          coil-router-fp7rb                            1/1     Running             0          14m   172.18.0.6   coil-worker2         <none>           <none>
kube-system          coil-router-xkvzl                            1/1     Running             0          14m   172.18.0.7   coil-worker          <none>           <none>
kube-system          coild-d7fh7                                  1/1     Running             0          14m   172.18.0.7   coil-worker          <none>           <none>
kube-system          coild-j66hh                                  1/1     Running             0          14m   172.18.0.6   coil-worker2         <none>           <none>
kube-system          coild-nmsnv                                  1/1     Running             0          14m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          coredns-bd6b6df9f-fmgd6                      0/1     ContainerCreating   0          17m   <none>       coil-control-plane   <none>           <none>
kube-system          coredns-bd6b6df9f-qqd47                      0/1     ContainerCreating   0          17m   <none>       coil-control-plane   <none>           <none>
kube-system          etcd-coil-control-plane                      1/1     Running             0          17m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          kube-apiserver-coil-control-plane            1/1     Running             0          17m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          kube-controller-manager-coil-control-plane   1/1     Running             0          17m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          kube-proxy-4rdh7                             1/1     Running             0          16m   172.18.0.6   coil-worker2         <none>           <none>
kube-system          kube-proxy-j4b48                             1/1     Running             0          17m   172.18.0.5   coil-control-plane   <none>           <none>
kube-system          kube-proxy-zwnnd                             1/1     Running             0          17m   172.18.0.7   coil-worker          <none>           <none>
kube-system          kube-scheduler-coil-control-plane            1/1     Running             0          17m   172.18.0.5   coil-control-plane   <none>           <none>
local-path-storage   local-path-provisioner-6fd4f85bbc-vbv58      0/1     ContainerCreating   0          17m   <none>       coil-control-plane   <none>           <none>

when I describe the pods kubectl describe pods coredns-bd6b6df9f-fmgd6 -n kube-system

It's show

 failed (add): failed to allocate address; aborting new block request: context deadline exceeded
  Warning  FailedCreatePodSandBox  50s    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "6516770d68bb935f8746c5a0ef17636d028ecd932ba0dbe2fd686a63e19ff935": plugin type="coil" failed (add): failed to allocate address; aborting new block request: context deadline exceeded
  Warning  FailedCreatePodSandBox  20s    kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3d0c988e6c39ff1183964d6adc34105998919be5ec61af6c1214526781fc6d52": plugin type="coil" failed (add): failed to allocate address; aborting new block request: context deadline exceeded

Would you like to share some experience about how to trouble shooting this issue?

yokaze commented 7 months ago

Hi @linchuan4028, the pods running properly are on host network and the others are not. So we can see Coil has a trouble to assign a unique pod IP for them.

To solve the problem, let Coil know an appropriate CIDR to use for pods. It is written on the last line of the Quick start: https://github.com/cybozu-go/coil#quick-start

As though it looks problematic after install-coil, but its ok to proceed to run:

$ ../bin/kubectl apply -f manifests/default_pool.yaml

Could you please try it?

linchuan4028 commented 7 months ago

It works. thx. Additional question: I don't known if the egress gateway can handle that as we set hostNetwork:true on the nat-client pod.

terassyi commented 7 months ago

I don't known if the egress gateway can handle that as we set hostNetwork:true on the nat-client pod.

We cannot use egress gateway for the pod set hostNetwork: true.

https://github.com/cybozu-go/coil/blob/v2.5.1/v2/controllers/pod_watcher.go#L84-L85

linchuan4028 commented 7 months ago

Thanks, I have read your blogs in https://blog.kintone.io/entry/coilv2#Problems-solved-by-Coil There are description about work with other CNI like calico and cilium.

Coil is designed to be easily integrated with other software such as [BIRD](https://bird.network.cz/), [MetalLB](https://metallb.universe.tf/), [Calico](https://www.projectcalico.org/), or [Cilium](https://cilium.io/) to implement Kubernetes features like LoadBalancer or NetworkPolicy

We are now working on the flannel CNI in our production. I'm very interested in the egress gateway feature. Is there a solution to keep the flannel CNI in our cluster and adopt the coil egress gateway feature?

terassyi commented 7 months ago

Coil egress cannot use with flannel, because the egress feature depends on the its IPAM.