DGolubets / k8s-managed-node-pool

Managed node pool for K8s
MIT License
5 stars 1 forks source link

Pods didn't trigger scale-up #3

Open jetdream opened 3 months ago

jetdream commented 3 months ago

This operator is what I need! Thank you for your work!

Trying to test based on your example, just slightly modified it for my test cluster. Node pool is successfully created but pod is not scheduled. Could you please check what's wrong?

Error: 0/3 nodes are available: 1 node(s) had untolerated taint {dgolubets.github.io/managed-node-pool: eee41d49-82a4-4b0b-8488-7e11d8081e77}, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling

My manifests:

apiVersion: dgolubets.github.io/v1alpha1
kind: ManagedNodePool
metadata:
  name: managed-pool-1
  namespace: managed-node-pool
spec:
  name: managed-pool-test-1
  size: s-1vcpu-2gb
  count: 1
  min_count: 1
  max_count: 4
  labels:
    label1: "value1"
    label2: "value2"
  tags:
    - tag1
    - tag2
  idle_timeout: "15s"

---
apiVersion: v1
kind: Pod
metadata:
  name: test
  namespace: managed-node-pool
  labels:
    app: test
spec:
  containers:
    - name: ubuntu
      image: ubuntu:22.04
      resources:
        requests:
          cpu: "100m"
          memory: "100Mi"
        limits:
          cpu: "100m"
          memory: "100Mi"
      command: ["/bin/bash", "-c", "--"]
      args: ["while true; do sleep 30; done;"]
  nodeSelector:
    dgolubets.github.io/managed-node-pool: managed-pool-1.managed-node-pool

Here is the deployed pod resources:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"app":"test"},"name":"test","namespace":"managed-node-pool"},"spec":{"containers":[{"args":["while
      true; do sleep 30;
      done;"],"command":["/bin/bash","-c","--"],"image":"ubuntu:22.04","name":"ubuntu","resources":{"limits":{"cpu":"100m","memory":"100Mi"},"requests":{"cpu":"100m","memory":"100Mi"}}}],"nodeSelector":{"dgolubets.github.io/managed-node-pool":"managed-pool-1.managed-node-pool"}}}
  creationTimestamp: '2024-08-29T00:54:40Z'
  labels:
    app: test
    dgolubets.github.io/managed-node-pool: managed-pool-1.managed-node-pool
  managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:labels:
            f:dgolubets.github.io/managed-node-pool: {}
        f:spec:
          f:tolerations: {}
      manager: dgolubets.github.io/managed-node-pool
      operation: Apply
      time: '2024-08-29T00:54:50Z'
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
          f:labels:
            .: {}
            f:app: {}
        f:spec:
          f:containers:
            k:{"name":"ubuntu"}:
              .: {}
              f:args: {}
              f:command: {}
              f:image: {}
              f:imagePullPolicy: {}
              f:name: {}
              f:resources:
                .: {}
                f:limits:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
                f:requests:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
              f:terminationMessagePath: {}
              f:terminationMessagePolicy: {}
          f:dnsPolicy: {}
          f:enableServiceLinks: {}
          f:nodeSelector: {}
          f:restartPolicy: {}
          f:schedulerName: {}
          f:securityContext: {}
          f:terminationGracePeriodSeconds: {}
      manager: kubectl-client-side-apply
      operation: Update
      time: '2024-08-29T00:54:40Z'
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          f:conditions:
            .: {}
            k:{"type":"PodScheduled"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
      manager: kube-scheduler
      operation: Update
      subresource: status
      time: '2024-08-29T01:05:03Z'
  name: test
  namespace: managed-node-pool
  resourceVersion: '18711472'
  uid: cde0d39c-b49a-4be2-8419-fab1ccff8f8c
spec:
  containers:
    - args:
        - while true; do sleep 30; done;
      command:
        - /bin/bash
        - '-c'
        - '--'
      image: ubuntu:22.04
      imagePullPolicy: IfNotPresent
      name: ubuntu
      resources:
        limits:
          cpu: 100m
          memory: 100Mi
        requests:
          cpu: 100m
          memory: 100Mi
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: kube-api-access-6cvq7
          readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
    - name: do_registry_1
  nodeSelector:
    dgolubets.github.io/managed-node-pool: managed-pool-1.managed-node-pool
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
    - effect: NoSchedule
      key: dgolubets.github.io/managed-node-pool
      operator: Equal
      value: e2721bf3-8294-4ee9-9f48-7aeb33949bb6
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 300
  volumes:
    - name: kube-api-access-6cvq7
      projected:
        defaultMode: 420
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              items:
                - key: ca.crt
                  path: ca.crt
              name: kube-root-ca.crt
          - downwardAPI:
              items:
                - fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
                  path: namespace
status:
  conditions:
    - lastProbeTime: null
      lastTransitionTime: '2024-08-29T00:54:40Z'
      message: >-
        0/3 nodes are available: 1 node(s) had untolerated taint
        {dgolubets.github.io/managed-node-pool:
        eee41d49-82a4-4b0b-8488-7e11d8081e77}, 2 node(s) didn't match Pod's node
        affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is
        not helpful for scheduling.
      reason: Unschedulable
      status: 'False'
      type: PodScheduled
  phase: Pending
  qosClass: Guaranteed
DGolubets commented 3 months ago

Hi,

For some reason you got a taint/toleration mismatch between the node and the pod. The node got eee41d49-82a4-4b0b-8488-7e11d8081e77 taint, but the pod was assigned e2721bf3-8294-4ee9-9f48-7aeb33949bb6. This UUID should be the same and it should be UID of the managed pool resource.

Could you provide the logs from the operator pod? That will help to understand what happened.