cybozu-go / coil

CNI plugin for Kubernetes designed for scalability and extensibility
Apache License 2.0
158 stars 18 forks source link

Two AddressBlocks are created when coil-controller is temporarily down #271

Open masa213f opened 5 months ago

masa213f commented 5 months ago

Describe the bug

The Coil create two AddressBlocks when the coil-controller is temporarily down. And one of the two AddressBlocks may leak when the pod which uses the AddressBlocks is deleted.

Environments

To Reproduce

  1. Setup kind cluster

    $ cd ~/go/src/github.com/cybozu-go/coil/v2/e2e
    $ make start install-coil
    $ kubectl apply -f manifests/default_pool.yaml
  2. Create AddressPool and Namespace

    
    $ kubectl apply -f - << EOF
    apiVersion: coil.cybozu.com/v2
    kind: AddressPool
    metadata:
    name: test-pool
    spec:
    blockSizeBits: 0
    subnets:
    - ipv4: 10.0.0.0/30
    EOF

$ kubectl apply -f - << EOF apiVersion: v1 kind: Namespace metadata: annotations: coil.cybozu.com/pool: test-pool name: test-ns EOF


3. Stop coil-controllers
```console
$ kubectl patch deployment -n kube-system coil-controller -p '{"spec":{"replicas":0}}'
$ kubectl get pod -n kube-system -l app.kubernetes.io/component=coil-controller
  1. Create a Pod

    $ kubectl apply -f - << EOF
    apiVersion: v1
    kind: Pod
    metadata:
    name: test-pod
    namespace: test-ns
    spec:
    containers:
    - name: ubuntu
    image: ghcr.io/cybozu/ubuntu:22.04
    command: ["pause"]
    EOF
  2. Wait 1 minute

  3. Start the coil-controllers

    $ kubectl patch deployment -n kube-system coil-controller -p '{"spec":{"replicas":2}}'
    $ kubectl get pod -n kube-system -l app.kubernetes.io/component=coil-controller

Then, the Coil sometimes creates two AddressBlocks for the test Pod.

$ kubectl get pod -n test-ns -o wide
NAME       READY   STATUS    RESTARTS   AGE     IP         NODE           NOMINATED NODE   READINESS GATES
test-pod   1/1     Running   0          2m56s   10.0.0.1   coil-worker3   <none>           <none>

$ kubectl get addresspool,addressblock 
NAME                                    BLOCKSIZE BITS
addresspool.coil.cybozu.com/default     0
addresspool.coil.cybozu.com/test-pool   0

NAME                                       NODE                 POOL        IPV4            IPV6
addressblock.coil.cybozu.com/default-0     coil-control-plane   default     10.244.0.0/32   
addressblock.coil.cybozu.com/default-1     coil-control-plane   default     10.244.0.1/32   
addressblock.coil.cybozu.com/default-2     coil-control-plane   default     10.244.0.2/32   
addressblock.coil.cybozu.com/test-pool-0   coil-worker3         test-pool   10.0.0.0/32    ★ Two address blocks exist.
addressblock.coil.cybozu.com/test-pool-1   coil-worker3         test-pool   10.0.0.1/32    ★

After this, when the test Pod is deleted, one AddressBlock remains.

$ kubectl delete pod -n test-ns test-pod
pod "test-pod" deleted

$ kubectl get pod -n test-ns -o wide
No resources found in test-ns namespace.

$ kubectl get addresspool,addressblock 
NAME                                    BLOCKSIZE BITS
addresspool.coil.cybozu.com/default     0
addresspool.coil.cybozu.com/test-pool   0

NAME                                       NODE                 POOL        IPV4            IPV6
addressblock.coil.cybozu.com/default-0     coil-control-plane   default     10.244.0.0/32   
addressblock.coil.cybozu.com/default-1     coil-control-plane   default     10.244.0.1/32   
addressblock.coil.cybozu.com/default-2     coil-control-plane   default     10.244.0.2/32   
addressblock.coil.cybozu.com/test-pool-0   coil-worker3         test-pool   10.0.0.0/32    ★ This doesn't go away.
ymmt2005 commented 5 months ago

I believe coild on the assigned node will eventually collect unused AddressBlocks. https://github.com/cybozu-go/coil/blob/main/docs/design.md#addressblock

At startup, coild also checks each AddressBlock for the Node, and if no Pod is using the addresses in the block, it deletes the AddressBlock.

Please reopen if I'm wrong.

masa213f commented 5 months ago

@ymmt2005 In the actual cluster, an unused AddressBlock had been left for nearly half a year...

ymmt2005 commented 5 months ago

If that is so serious, I'd like to suggest calling the GC logic periodically, not only at the process startup.

masa213f commented 5 months ago

I'd like to suggest calling the GC logic periodically,

That sounds good.

But this is not a serious problem, except when deleting addresspools. I don't think we should bother to modify it.

ymmt2005 commented 5 months ago

A workaround is to manually delete the coild Pod of the node. That will trigger a GC.

Deleting coild is a safe operation. It does not interrupt networking.