karpenter cannot start g5 type instance in cn-north-1

Description

Observed Behavior: In the cn-north-1 region, karpenter cannot apply for a g5 instance. Even though all configurations are correct, an error message is displayed：

incompatible with provisioner \"coder-china-dev1-cn-coder-mepy-dev\", daemonset overhead={\"cpu\":\"331m\",\"memory\":\"275Mi\",\"pods\":\"6\"}, did not tolerate coder-mepy-dev=true:NoSchedule; incompatible with provisioner \"coder-algo-train\", daemonset overhead={\"cpu\":\"331m\",\"memory\":\"275Mi\",\"pods\":\"6\"}, no instance type satisfied resources {\"cpu\":\"3331m\",\"memory\":\"11539Mi\",\"nvidia.com/gpu\":\"1\",\"pods\":\"7\"} and requirements karpenter.k8s.aws/instance-family In [g5], karpenter.k8s.aws/instance-gpu-count In [1], karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/provisioner-name In [coder-algo-train], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux], noderole In [coder-algo-train], topology.kubernetes.io/zone In [cn-north-1a] (no instance type met all requirements)

Expected Behavior: In the cn-north-1 region, karpenter can apply for a g5 instance

Reproduction Steps (Please include YAML):

This is my karpenter configuration：

apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: coder-china-dev1-cn-coder-algo-train
spec:
  amiFamily: Bottlerocket
  blockDeviceMappings:
  - deviceName: /dev/xvdb
    ebs:
      deleteOnTermination: true
      encrypted: true
      volumeSize: 50Gi
      volumeType: gp3
  securityGroupSelector:
    aws-ids: sg-xxxxxx
  subnetSelector:
    Name: dev1-xxxxx

---
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: coder-algo-train
spec:
  kubeletConfiguration:
    maxPods: 30
  labels:
    noderole: coder-algo-train
  limits:
    resources:
      cpu: "150"
  providerRef:
    name: coder-china-dev1-cn-coder-algo-train
  requirements:
  - key: karpenter.k8s.aws/instance-family
    operator: In
    values:
    - g5
  - key: topology.kubernetes.io/zone
    operator: In
    values:
    - cn-north-1a
  - key: karpenter.k8s.aws/instance-gpu-count
    operator: In
    values:
    - "1"
  - key: kubernetes.io/os
    operator: In
    values:
    - linux
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
    - spot
  taints:
  - effect: NoSchedule
    key: coder-algo-train
    value: "true"
  ttlSecondsAfterEmpty: 30
  ttlSecondsUntilExpired: 315360000

this is my pod config:

apiVersion: v1
kind: Pod
metadata:
  name: algo-test-7fdbf696f4-rvv5d
  namespace: china-dev1-cn
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: coder-admin-algo-test-7fdbf696f4
    uid: 1f09e543-c694-461e-a5ea-eb178f4071e9
  resourceVersion: "53629327"
  uid: 652f06e5-1607-41cb-bda2-e3adf3685dd2
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - coder-workspace
        topologyKey: kubernetes.io/hostname
  automountServiceAccountToken: true
  containers:
  - command:
    - sh
    - -c
    image: tf2.7_general2:v24.42.0.rc.3
    imagePullPolicy: Always
    name: dev
    resources:
      limits:
        cpu: "3"
        memory: 11Gi
        nvidia.com/gpu: "1"
      requests:
        cpu: "3"
        memory: 11Gi
        nvidia.com/gpu: "1"
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeSelector:
    noderole: coder-algo-train
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: sa-coder-algo-train
  serviceAccountName: sa-coder-algo-train
  shareProcessNamespace: false
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: coder-algo-train
    operator: Equal
    value: "true"

Versions: v0.32.10

Chart Version: v0.32.10
Kubernetes Version (kubectl version): v1.30
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

aws / karpenter-provider-aws

karpenter cannot start g5 type instance in cn-north-1 #7272

Description