addressblocks are not freed when scheduled on master nodes

tflabs-nl commented 2 years ago

Describe the bug I had a coredns pod that was being scheduled on a master node over and over, which failed due to CNI version incompatibility. Each restart resulted in a new addressblock reservation. Address blocks did not clear after each failed attemt and the full /16 is now used up, resulting in pods stuck in creating phase.

Environments

Version: 20.04
OS: Ubuntu

To Reproduce

Expected behavior Block is being cleared on pod finalization, even on master nodes.

tflabs-nl commented 2 years ago

Could possibly be an extension of problem #176

tflabs-nl commented 2 years ago

Also manual deletion of an addressblock fails without errors in the coil-controller pods and coild pods

tflabs-nl commented 2 years ago

Is there a way I can force delete(/free) some blocks by hand? As all (re)scheduled pods fail at this point. Also, the total amount of pods running on this cluster is 109.

ysksuzuki commented 2 years ago

Is there a way I can force delete(/free) some blocks by hand?

Try rebooting the coild running on the master node with kubectl delete pod. Coild frees unused blocks when it starts.

Can you reproduce this issue with Kind or something? If you can, please tell me how to do that.

tflabs-nl commented 2 years ago

Can confirm that this clears the unused blocks

tflabs-nl commented 2 years ago

I think all that's needed is the default IP address pool as mentioned in the documentation, as well as this coredns yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  labels:
    k8s-app: kube-dns
  name: coredns
  namespace: kube-system
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: kube-dns
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      annotations:
        egress.coil.cybozu.com/webserver-internet: nat
      creationTimestamp: null
      labels:
        k8s-app: kube-dns
    spec:
      containers:
      - args:
        - -conf
        - /etc/coredns/Corefile
        image: k8s.gcr.io/coredns/coredns:v1.8.6
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: coredns
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9153
          name: metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: 8181
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            memory: 170Mi
          requests:
            cpu: 100m
            memory: 70Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - all
          readOnlyRootFilesystem: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/coredns
          name: config-volume
          readOnly: true
      dnsPolicy: None
      dnsConfig:
        nameservers:
          - 1.1.1.1
          - 8.8.8.8
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: coredns
      serviceAccountName: coredns
      terminationGracePeriodSeconds: 30
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: Corefile
            path: Corefile
          name: coredns
        name: config-volume

Would have to try to replicate this behavior in Kind, never used it before.

ysksuzuki commented 2 years ago

Thanks, will look into that.

yamatcha commented 2 years ago

We couldn't reproduce this issue. Could you provide me with more details on how you encountered this?

tloader11 commented 2 years ago

I will try to reproduce this issue using Kind myself this weekend. It all came down to CoreDNS trying to schedule on a master node, which failed due to invalid CNI version. This resulted in a crash-loop that created lots of address blocks.

ysksuzuki commented 2 years ago

Feel free to reopen this issue if you still have a problem.

cybozu-go / coil

addressblocks are not freed when scheduled on master nodes #202