NVIDIA / k8s-dra-driver

Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes
Apache License 2.0
226 stars 41 forks source link

Pod stuck in containercreating phase #35

Open asm582 opened 9 months ago

asm582 commented 9 months ago

Hello,

launched 3 jobs two with profile 2g.20gb and one with profile 1g.10gb. The last job is stuck in containercreating phase:

NAMESPACE            NAME                                                           READY   STATUS              RESTARTS   AGE
gpu-test1            job1-6n8x2                                                     1/1     Running             0          7m28s
gpu-test1            job2-wc67x                                                     1/1     Running             0          7m10s
gpu-test1            job31-wkwpn                                                    0/1     ContainerCreating   0          3m8s
[root@nvd-srv-02 ~]# kubectl describe nas/k8s-dra-driver-cluster-worker -n nvidia-dra-driver
Name:         k8s-dra-driver-cluster-worker
Namespace:    nvidia-dra-driver
Labels:       <none>
Annotations:  <none>
API Version:  nas.gpu.resource.nvidia.com/v1alpha1
Kind:         NodeAllocationState
Metadata:
  Creation Timestamp:  2023-12-04T14:29:09Z
  Generation:          80
  Owner References:
    API Version:     v1
    Kind:            Node
    Name:            k8s-dra-driver-cluster-worker
    UID:             ddb095d1-a608-4f70-a7b2-bc55ad81ed4c
  Resource Version:  56820
  UID:               863efe97-f965-4f42-9816-88e5fc3bb860
Spec:
  Allocatable Devices:
    Gpu:
      Architecture:             Ampere
      Brand:                    Nvidia
      Cuda Compute Capability:  8.0
      Index:                    0
      Memory Bytes:             85899345920
      Mig Enabled:              true
      Product Name:             NVIDIA A100 80GB PCIe
      Uuid:                     GPU-1a9afbae-5932-54f8-c2c4-a863888d45bb
    Gpu:
      Architecture:             Ampere
      Brand:                    Nvidia
      Cuda Compute Capability:  8.0
      Index:                    1
      Memory Bytes:             85899345920
      Mig Enabled:              false
      Product Name:             NVIDIA A100 80GB PCIe
      Uuid:                     GPU-713eebac-08df-c534-6c98-8d5055ca97a9
    Mig:
      Parent Product Name:  NVIDIA A100 80GB PCIe
      Placements:
        Size:   1
        Start:  0
        Size:   1
        Start:  1
        Size:   1
        Start:  2
        Size:   1
        Start:  3
        Size:   1
        Start:  4
        Size:   1
        Start:  5
        Size:   1
        Start:  6
      Profile:  1g.10gb+me
    Mig:
      Parent Product Name:  NVIDIA A100 80GB PCIe
      Placements:
        Size:   2
        Start:  0
        Size:   2
        Start:  2
        Size:   2
        Start:  4
        Size:   2
        Start:  6
      Profile:  1g.20gb
    Mig:
      Parent Product Name:  NVIDIA A100 80GB PCIe
      Placements:
        Size:   1
        Start:  0
        Size:   1
        Start:  1
        Size:   1
        Start:  2
        Size:   1
        Start:  3
        Size:   1
        Start:  4
        Size:   1
        Start:  5
        Size:   1
        Start:  6
      Profile:  1g.10gb
    Mig:
      Parent Product Name:  NVIDIA A100 80GB PCIe
      Placements:
        Size:   2
        Start:  0
        Size:   2
        Start:  2
        Size:   2
        Start:  4
      Profile:  2g.20gb
    Mig:
      Parent Product Name:  NVIDIA A100 80GB PCIe
      Placements:
        Size:   4
        Start:  0
        Size:   4
        Start:  4
      Profile:  3g.40gb
    Mig:
      Parent Product Name:  NVIDIA A100 80GB PCIe
      Placements:
        Size:   4
        Start:  0
      Profile:  4g.40gb
    Mig:
      Parent Product Name:  NVIDIA A100 80GB PCIe
      Placements:
        Size:   8
        Start:  0
      Profile:  7g.80gb
  Allocated Claims:
    05012d63-f2eb-4a73-be38-b3938d9aa891:
      Claim Info:
        Name:       job2-wc67x-mig2g
        Namespace:  gpu-test1
        UID:        05012d63-f2eb-4a73-be38-b3938d9aa891
      Mig:
        Devices:
          Parent UUID:  GPU-1a9afbae-5932-54f8-c2c4-a863888d45bb
          Placement:
            Size:   2
            Start:  2
          Profile:  2g.20gb
    148dc683-2f43-4b2f-a11e-2d49477cf6d6:
      Claim Info:
        Name:       job31-wkwpn-mig1g
        Namespace:  gpu-test1
        UID:        148dc683-2f43-4b2f-a11e-2d49477cf6d6
      Mig:
        Devices:
          Parent UUID:  GPU-1a9afbae-5932-54f8-c2c4-a863888d45bb
          Placement:
            Size:   2
            Start:  4
          Profile:  1g.20gb
    73c4108d-e016-4e79-b77e-52ca25f050ec:
      Claim Info:
        Name:       job1-6n8x2-mig2g
        Namespace:  gpu-test1
        UID:        73c4108d-e016-4e79-b77e-52ca25f050ec
      Mig:
        Devices:
          Parent UUID:  GPU-1a9afbae-5932-54f8-c2c4-a863888d45bb
          Placement:
            Size:   2
            Start:  0
          Profile:  2g.20gb
  Prepared Claims:
    05012d63-f2eb-4a73-be38-b3938d9aa891:
      Mig:
        Devices:
          Parent UUID:  GPU-1a9afbae-5932-54f8-c2c4-a863888d45bb
          Placement:
            Size:   2
            Start:  2
          Profile:  2g.20gb
          Uuid:     MIG-896dd9e6-84f9-57e8-9426-f298047b914c
    73c4108d-e016-4e79-b77e-52ca25f050ec:
      Mig:
        Devices:
          Parent UUID:  GPU-1a9afbae-5932-54f8-c2c4-a863888d45bb
          Placement:
            Size:   2
            Start:  0
          Profile:  2g.20gb
          Uuid:     MIG-f4477f87-90ef-5540-b4ab-78c9286ea812
Status:             Ready
Events:             <none>
[root@nvd-srv-02 ~]# kubectl describe pod  job31-wkwpn -n gpu-test1
Name:             job31-wkwpn
Namespace:        gpu-test1
Priority:         0
Service Account:  default
Node:             k8s-dra-driver-cluster-worker/172.18.0.2
Start Time:       Mon, 04 Dec 2023 20:24:11 -0500
Labels:           batch.kubernetes.io/controller-uid=f7c3deda-d6d4-4853-b120-5c1ecd084153
                  batch.kubernetes.io/job-name=job31
                  controller-uid=f7c3deda-d6d4-4853-b120-5c1ecd084153
                  job-name=job31
Annotations:      <none>
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    Job/job31
Containers:
  ctr:
    Container ID:  
    Image:         ubuntu:22.04
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      bash
      -c
    Args:
      nvidia-smi -L; sleep infinity
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nz9b6 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-nz9b6:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  5m3s   default-scheduler  0/2 nodes are available: waiting for dynamic resource controller to create the resourceclaim "job31-wkwpn-mig1g". no new claims to deallocate, preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod..
  Warning  FailedScheduling  5m1s   default-scheduler  running Reserve plugin "DynamicResources": waiting for resource driver to allocate resource
  Normal   Scheduled         4m58s  default-scheduler  Successfully assigned gpu-test1/job31-wkwpn to k8s-dra-driver-cluster-worker

are the profiles not supported on single A100 80GB GPU?

klueska commented 9 months ago

That should work. It’s being allocated and scheduled to the node, but then failing in the prepare stage on the node. What do the logs of the DRA plugin show?

asm582 commented 9 months ago

Here are the logs:

[root@nvd-srv-02 ~]# kubectl logs nvidia-k8s-dra-driver-controller-6d6b45756-gswbf -n nvidia-dra-driver | grep "job31"
I1205 01:22:39.934102       1 controller.go:241] "resource controller: new object" type="ResourceClaim" content="{\"metadata\":{\"name\":\"job31-l7g9x-mig2g\",\"namespace\":\"gpu-test1\",\"uid\":\"b7156f3e-7cff-4849-a04f-6f840f02198d\",\"resourceVersion\":\"56653\",\"creationTimestamp\":\"2023-12-05T01:22:39Z\",\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"name\":\"job31-l7g9x\",\"uid\":\"c000dfb4-fad5-4658-8497-d48db3d67d3d\",\"controller\":true,\"blockOwnerDeletion\":true}],\"managedFields\":[{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:39Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"c000dfb4-fad5-4658-8497-d48db3d67d3d\\\"}\":{}}},\"f:spec\":{\"f:allocationMode\":{},\"f:parametersRef\":{\".\":{},\"f:apiGroup\":{},\"f:kind\":{},\"f:name\":{}},\"f:resourceClassName\":{}}}}]},\"spec\":{\"resourceClassName\":\"gpu.nvidia.com\",\"parametersRef\":{\"apiGroup\":\"gpu.resource.nvidia.com\",\"kind\":\"MigDeviceClaimParameters\",\"name\":\"mig-1g.20gb\"},\"allocationMode\":\"WaitForFirstConsumer\"},\"status\":{}}"
I1205 01:22:39.934127       1 controller.go:260] "resource controller: Adding new work item" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:22:39.934151       1 controller.go:332] "resource controller: processing" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:22:39.934164       1 controller.go:476] "resource controller: ResourceClaim waiting for first consumer" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:22:39.934172       1 controller.go:336] "resource controller: completed" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:22:41.232095       1 controller.go:241] "resource controller: new object" type="PodSchedulingContext" content="{\"metadata\":{\"name\":\"job31-l7g9x\",\"namespace\":\"gpu-test1\",\"uid\":\"dcc27d2c-6bf8-49c0-9708-7f453f21a207\",\"resourceVersion\":\"56660\",\"creationTimestamp\":\"2023-12-05T01:22:41Z\",\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"name\":\"job31-l7g9x\",\"uid\":\"c000dfb4-fad5-4658-8497-d48db3d67d3d\",\"controller\":true}],\"managedFields\":[{\"manager\":\"kube-scheduler\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:41Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"c000dfb4-fad5-4658-8497-d48db3d67d3d\\\"}\":{}}},\"f:spec\":{\"f:potentialNodes\":{\".\":{},\"v:\\\"k8s-dra-driver-cluster-worker\\\"\":{}},\"f:selectedNode\":{}}}}]},\"spec\":{\"selectedNode\":\"k8s-dra-driver-cluster-worker\",\"potentialNodes\":[\"k8s-dra-driver-cluster-worker\"]},\"status\":{}}"
I1205 01:22:41.232122       1 controller.go:260] "resource controller: Adding new work item" key="schedulingCtx:gpu-test1/job31-l7g9x"
I1205 01:22:41.232150       1 controller.go:332] "resource controller: processing" key="schedulingCtx:gpu-test1/job31-l7g9x"
I1205 01:22:41.233818       1 round_trippers.go:553] GET https://10.96.0.1:443/api/v1/namespaces/gpu-test1/pods/job31-l7g9x 200 OK in 1 milliseconds
I1205 01:22:41.237401       1 controller.go:674] "resource controller: pending pod claims" key="schedulingCtx:gpu-test1/job31-l7g9x" claims=[{PodClaimName:mig2g Claim:&ResourceClaim{ObjectMeta:{job31-l7g9x-mig2g  gpu-test1  b7156f3e-7cff-4849-a04f-6f840f02198d 56653 0 2023-12-05 01:22:39 +0000 UTC <nil> <nil> map[] map[] [{v1 Pod job31-l7g9x c000dfb4-fad5-4658-8497-d48db3d67d3d 0xc0005dc0ae 0xc0005dc0af}] [] [{kube-controller-manager Update resource.k8s.io/v1alpha2 2023-12-05 01:22:39 +0000 UTC FieldsV1 {"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"c000dfb4-fad5-4658-8497-d48db3d67d3d\"}":{}}},"f:spec":{"f:allocationMode":{},"f:parametersRef":{".":{},"f:apiGroup":{},"f:kind":{},"f:name":{}},"f:resourceClassName":{}}} }]},Spec:ResourceClaimSpec{ResourceClassName:gpu.nvidia.com,ParametersRef:&ResourceClaimParametersReference{APIGroup:gpu.resource.nvidia.com,Kind:MigDeviceClaimParameters,Name:mig-1g.20gb,},AllocationMode:WaitForFirstConsumer,},Status:ResourceClaimStatus{DriverName:,Allocation:nil,ReservedFor:[]ResourceClaimConsumerReference{},DeallocationRequested:false,},} Class:&ResourceClass{ObjectMeta:{gpu.nvidia.com    c570a929-e0d7-40ec-8a0d-4d67fddd16d7 546 0 2023-12-04 14:29:06 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:Helm] map[meta.helm.sh/release-name:nvidia meta.helm.sh/release-namespace:nvidia-dra-driver] [] [] [{helm Update resource.k8s.io/v1alpha2 2023-12-04 14:29:06 +0000 UTC FieldsV1 {"f:driverName":{},"f:metadata":{"f:annotations":{".":{},"f:meta.helm.sh/release-name":{},"f:meta.helm.sh/release-namespace":{}},"f:labels":{".":{},"f:app.kubernetes.io/managed-by":{}}}} }]},DriverName:gpu.resource.nvidia.com,ParametersRef:nil,SuitableNodes:nil,} ClaimParameters:0xc0006a6720 ClassParameters:0xc000532008 UnsuitableNodes:[]}] selectedNode="k8s-dra-driver-cluster-worker"
I1205 01:22:41.237419       1 controller.go:687] "resource controller: allocation for selected node" key="schedulingCtx:gpu-test1/job31-l7g9x" node="k8s-dra-driver-cluster-worker"
I1205 01:22:41.237429       1 controller.go:538] "resource controller: Adding finalizer" key="schedulingCtx:gpu-test1/job31-l7g9x"
I1205 01:22:41.238920       1 round_trippers.go:553] PUT https://10.96.0.1:443/apis/resource.k8s.io/v1alpha2/namespaces/gpu-test1/resourceclaims/job31-l7g9x-mig2g 200 OK in 1 milliseconds
I1205 01:22:41.239037       1 controller.go:548] "resource controller: Allocating" key="schedulingCtx:gpu-test1/job31-l7g9x"
I1205 01:22:41.240768       1 controller.go:249] "resource controller: updated object" type="ResourceClaim" content="{\"metadata\":{\"name\":\"job31-l7g9x-mig2g\",\"namespace\":\"gpu-test1\",\"uid\":\"b7156f3e-7cff-4849-a04f-6f840f02198d\",\"resourceVersion\":\"56663\",\"creationTimestamp\":\"2023-12-05T01:22:39Z\",\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"name\":\"job31-l7g9x\",\"uid\":\"c000dfb4-fad5-4658-8497-d48db3d67d3d\",\"controller\":true,\"blockOwnerDeletion\":true}],\"finalizers\":[\"gpu.resource.nvidia.com/deletion-protection\"],\"managedFields\":[{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:39Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"c000dfb4-fad5-4658-8497-d48db3d67d3d\\\"}\":{}}},\"f:spec\":{\"f:allocationMode\":{},\"f:parametersRef\":{\".\":{},\"f:apiGroup\":{},\"f:kind\":{},\"f:name\":{}},\"f:resourceClassName\":{}}}},{\"manager\":\"nvidia-dra-controller\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:41Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:finalizers\":{\".\":{},\"v:\\\"gpu.resource.nvidia.com/deletion-protection\\\"\":{}}}}}]},\"spec\":{\"resourceClassName\":\"gpu.nvidia.com\",\"parametersRef\":{\"apiGroup\":\"gpu.resource.nvidia.com\",\"kind\":\"MigDeviceClaimParameters\",\"name\":\"mig-1g.20gb\"},\"allocationMode\":\"WaitForFirstConsumer\"},\"status\":{}}" diff=<
                        OwnerReferences: {{APIVersion: "v1", Kind: "Pod", Name: "job31-l7g9x", UID: "c000dfb4-fad5-4658-8497-d48db3d67d3d", ...}},
I1205 01:22:41.240785       1 controller.go:260] "resource controller: Adding updated work item" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:22:41.240804       1 controller.go:332] "resource controller: processing" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:22:41.240818       1 controller.go:476] "resource controller: ResourceClaim waiting for first consumer" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:22:41.240826       1 controller.go:336] "resource controller: completed" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:22:41.246171       1 controller.go:558] "resource controller: Updating claim after allocation" key="schedulingCtx:gpu-test1/job31-l7g9x" claim="&ResourceClaim{ObjectMeta:{job31-l7g9x-mig2g  gpu-test1  b7156f3e-7cff-4849-a04f-6f840f02198d 56663 0 2023-12-05 01:22:39 +0000 UTC <nil> <nil> map[] map[] [{v1 Pod job31-l7g9x c000dfb4-fad5-4658-8497-d48db3d67d3d 0xc000da007e 0xc000da007f}] [gpu.resource.nvidia.com/deletion-protection] [{kube-controller-manager Update resource.k8s.io/v1alpha2 2023-12-05 01:22:39 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"c000dfb4-fad5-4658-8497-d48db3d67d3d\\\"}\":{}}},\"f:spec\":{\"f:allocationMode\":{},\"f:parametersRef\":{\".\":{},\"f:apiGroup\":{},\"f:kind\":{},\"f:name\":{}},\"f:resourceClassName\":{}}} } {nvidia-dra-controller Update resource.k8s.io/v1alpha2 2023-12-05 01:22:41 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:finalizers\":{\".\":{},\"v:\\\"gpu.resource.nvidia.com/deletion-protection\\\"\":{}}}} }]},Spec:ResourceClaimSpec{ResourceClassName:gpu.nvidia.com,ParametersRef:&ResourceClaimParametersReference{APIGroup:gpu.resource.nvidia.com,Kind:MigDeviceClaimParameters,Name:mig-1g.20gb,},AllocationMode:WaitForFirstConsumer,},Status:ResourceClaimStatus{DriverName:gpu.resource.nvidia.com,Allocation:&AllocationResult{ResourceHandles:[]ResourceHandle{},AvailableOnNodes:&v1.NodeSelector{NodeSelectorTerms:[]NodeSelectorTerm{NodeSelectorTerm{MatchExpressions:[]NodeSelectorRequirement{},MatchFields:[]NodeSelectorRequirement{NodeSelectorRequirement{Key:metadata.name,Operator:In,Values:[k8s-dra-driver-cluster-worker],},},},},},Shareable:true,},ReservedFor:[]ResourceClaimConsumerReference{ResourceClaimConsumerReference{APIGroup:,Resource:pods,Name:job31-l7g9x,UID:c000dfb4-fad5-4658-8497-d48db3d67d3d,},},DeallocationRequested:false,},}"
I1205 01:22:41.247710       1 round_trippers.go:553] PUT https://10.96.0.1:443/apis/resource.k8s.io/v1alpha2/namespaces/gpu-test1/resourceclaims/job31-l7g9x-mig2g/status 200 OK in 1 milliseconds
I1205 01:22:41.247895       1 controller.go:724] "resource controller: Updating pod scheduling with modified unsuitable nodes" key="schedulingCtx:gpu-test1/job31-l7g9x" podSchedulingCtx="&PodSchedulingContext{ObjectMeta:{job31-l7g9x  gpu-test1  dcc27d2c-6bf8-49c0-9708-7f453f21a207 56660 0 2023-12-05 01:22:41 +0000 UTC <nil> <nil> map[] map[] [{v1 Pod job31-l7g9x c000dfb4-fad5-4658-8497-d48db3d67d3d 0xc000da023b <nil>}] [] [{kube-scheduler Update resource.k8s.io/v1alpha2 2023-12-05 01:22:41 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"c000dfb4-fad5-4658-8497-d48db3d67d3d\\\"}\":{}}},\"f:spec\":{\"f:potentialNodes\":{\".\":{},\"v:\\\"k8s-dra-driver-cluster-worker\\\"\":{}},\"f:selectedNode\":{}}} }]},Spec:PodSchedulingContextSpec{SelectedNode:k8s-dra-driver-cluster-worker,PotentialNodes:[k8s-dra-driver-cluster-worker],},Status:PodSchedulingContextStatus{ResourceClaims:[]ResourceClaimSchedulingStatus{ResourceClaimSchedulingStatus{Name:mig2g,UnsuitableNodes:[],},},},}"
I1205 01:22:41.249308       1 round_trippers.go:553] PUT https://10.96.0.1:443/apis/resource.k8s.io/v1alpha2/namespaces/gpu-test1/podschedulingcontexts/job31-l7g9x/status 200 OK in 1 milliseconds
I1205 01:22:41.249423       1 controller.go:342] "resource controller: recheck periodically" key="schedulingCtx:gpu-test1/job31-l7g9x"
I1205 01:22:41.250433       1 controller.go:249] "resource controller: updated object" type="ResourceClaim" content="{\"metadata\":{\"name\":\"job31-l7g9x-mig2g\",\"namespace\":\"gpu-test1\",\"uid\":\"b7156f3e-7cff-4849-a04f-6f840f02198d\",\"resourceVersion\":\"56665\",\"creationTimestamp\":\"2023-12-05T01:22:39Z\",\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"name\":\"job31-l7g9x\",\"uid\":\"c000dfb4-fad5-4658-8497-d48db3d67d3d\",\"controller\":true,\"blockOwnerDeletion\":true}],\"finalizers\":[\"gpu.resource.nvidia.com/deletion-protection\"],\"managedFields\":[{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:39Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"c000dfb4-fad5-4658-8497-d48db3d67d3d\\\"}\":{}}},\"f:spec\":{\"f:allocationMode\":{},\"f:parametersRef\":{\".\":{},\"f:apiGroup\":{},\"f:kind\":{},\"f:name\":{}},\"f:resourceClassName\":{}}}},{\"manager\":\"nvidia-dra-controller\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:41Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:finalizers\":{\".\":{},\"v:\\\"gpu.resource.nvidia.com/deletion-protection\\\"\":{}}}}},{\"manager\":\"nvidia-dra-controller\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:41Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:status\":{\"f:allocation\":{\".\":{},\"f:availableOnNodes\":{},\"f:shareable\":{}},\"f:driverName\":{},\"f:reservedFor\":{\".\":{},\"k:{\\\"uid\\\":\\\"c000dfb4-fad5-4658-8497-d48db3d67d3d\\\"}\":{\".\":{},\"f:name\":{},\"f:resource\":{},\"f:uid\":{}}}}},\"subresource\":\"status\"}]},\"spec\":{\"resourceClassName\":\"gpu.nvidia.com\",\"parametersRef\":{\"apiGroup\":\"gpu.resource.nvidia.com\",\"kind\":\"MigDeviceClaimParameters\",\"name\":\"mig-1g.20gb\"},\"allocationMode\":\"WaitForFirstConsumer\"},\"status\":{\"driverName\":\"gpu.resource.nvidia.com\",\"allocation\":{\"availableOnNodes\":{\"nodeSelectorTerms\":[{\"matchFields\":[{\"key\":\"metadata.name\",\"operator\":\"In\",\"values\":[\"k8s-dra-driver-cluster-worker\"]}]}]},\"shareable\":true},\"reservedFor\":[{\"resource\":\"pods\",\"name\":\"job31-l7g9x\",\"uid\":\"c000dfb4-fad5-4658-8497-d48db3d67d3d\"}]}}" diff=<
                        OwnerReferences: {{APIVersion: "v1", Kind: "Pod", Name: "job31-l7g9x", UID: "c000dfb4-fad5-4658-8497-d48db3d67d3d", ...}},
        +                               Name:     "job31-l7g9x",
I1205 01:22:41.250449       1 controller.go:260] "resource controller: Adding updated work item" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:22:41.250466       1 controller.go:332] "resource controller: processing" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:22:41.250494       1 controller.go:412] "resource controller: ResourceClaim in use" key="claim:gpu-test1/job31-l7g9x-mig2g" reservedFor=[{APIGroup: Resource:pods Name:job31-l7g9x UID:c000dfb4-fad5-4658-8497-d48db3d67d3d}]
I1205 01:22:41.250501       1 controller.go:336] "resource controller: completed" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:22:41.251091       1 controller.go:249] "resource controller: updated object" type="PodSchedulingContext" content="{\"metadata\":{\"name\":\"job31-l7g9x\",\"namespace\":\"gpu-test1\",\"uid\":\"dcc27d2c-6bf8-49c0-9708-7f453f21a207\",\"resourceVersion\":\"56666\",\"creationTimestamp\":\"2023-12-05T01:22:41Z\",\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"name\":\"job31-l7g9x\",\"uid\":\"c000dfb4-fad5-4658-8497-d48db3d67d3d\",\"controller\":true}],\"managedFields\":[{\"manager\":\"kube-scheduler\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:41Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"c000dfb4-fad5-4658-8497-d48db3d67d3d\\\"}\":{}}},\"f:spec\":{\"f:potentialNodes\":{\".\":{},\"v:\\\"k8s-dra-driver-cluster-worker\\\"\":{}},\"f:selectedNode\":{}}}},{\"manager\":\"nvidia-dra-controller\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:41Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:status\":{\"f:resourceClaims\":{\".\":{},\"k:{\\\"name\\\":\\\"mig2g\\\"}\":{\".\":{},\"f:name\":{}}}}},\"subresource\":\"status\"}]},\"spec\":{\"selectedNode\":\"k8s-dra-driver-cluster-worker\",\"potentialNodes\":[\"k8s-dra-driver-cluster-worker\"]},\"status\":{\"resourceClaims\":[{\"name\":\"mig2g\"}]}}" diff=<
                        OwnerReferences: {{APIVersion: "v1", Kind: "Pod", Name: "job31-l7g9x", UID: "c000dfb4-fad5-4658-8497-d48db3d67d3d", ...}},
I1205 01:22:41.251105       1 controller.go:260] "resource controller: Adding updated work item" key="schedulingCtx:gpu-test1/job31-l7g9x"
I1205 01:22:41.251123       1 controller.go:332] "resource controller: processing" key="schedulingCtx:gpu-test1/job31-l7g9x"
I1205 01:22:41.252002       1 round_trippers.go:553] GET https://10.96.0.1:443/api/v1/namespaces/gpu-test1/pods/job31-l7g9x 200 OK in 0 milliseconds
I1205 01:22:41.255202       1 controller.go:674] "resource controller: pending pod claims" key="schedulingCtx:gpu-test1/job31-l7g9x" claims=[{PodClaimName:mig2g Claim:&ResourceClaim{ObjectMeta:{job31-l7g9x-mig2g  gpu-test1  b7156f3e-7cff-4849-a04f-6f840f02198d 56665 0 2023-12-05 01:22:39 +0000 UTC <nil> <nil> map[] map[] [{v1 Pod job31-l7g9x c000dfb4-fad5-4658-8497-d48db3d67d3d 0xc0005dc75e 0xc0005dc75f}] [gpu.resource.nvidia.com/deletion-protection] [{kube-controller-manager Update resource.k8s.io/v1alpha2 2023-12-05 01:22:39 +0000 UTC FieldsV1 {"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"c000dfb4-fad5-4658-8497-d48db3d67d3d\"}":{}}},"f:spec":{"f:allocationMode":{},"f:parametersRef":{".":{},"f:apiGroup":{},"f:kind":{},"f:name":{}},"f:resourceClassName":{}}} } {nvidia-dra-controller Update resource.k8s.io/v1alpha2 2023-12-05 01:22:41 +0000 UTC FieldsV1 {"f:metadata":{"f:finalizers":{".":{},"v:\"gpu.resource.nvidia.com/deletion-protection\"":{}}}} } {nvidia-dra-controller Update resource.k8s.io/v1alpha2 2023-12-05 01:22:41 +0000 UTC FieldsV1 {"f:status":{"f:allocation":{".":{},"f:availableOnNodes":{},"f:shareable":{}},"f:driverName":{},"f:reservedFor":{".":{},"k:{\"uid\":\"c000dfb4-fad5-4658-8497-d48db3d67d3d\"}":{".":{},"f:name":{},"f:resource":{},"f:uid":{}}}}} status}]},Spec:ResourceClaimSpec{ResourceClassName:gpu.nvidia.com,ParametersRef:&ResourceClaimParametersReference{APIGroup:gpu.resource.nvidia.com,Kind:MigDeviceClaimParameters,Name:mig-1g.20gb,},AllocationMode:WaitForFirstConsumer,},Status:ResourceClaimStatus{DriverName:gpu.resource.nvidia.com,Allocation:&AllocationResult{ResourceHandles:[]ResourceHandle{},AvailableOnNodes:&v1.NodeSelector{NodeSelectorTerms:[]NodeSelectorTerm{NodeSelectorTerm{MatchExpressions:[]NodeSelectorRequirement{},MatchFields:[]NodeSelectorRequirement{NodeSelectorRequirement{Key:metadata.name,Operator:In,Values:[k8s-dra-driver-cluster-worker],},},},},},Shareable:true,},ReservedFor:[]ResourceClaimConsumerReference{ResourceClaimConsumerReference{APIGroup:,Resource:pods,Name:job31-l7g9x,UID:c000dfb4-fad5-4658-8497-d48db3d67d3d,},},DeallocationRequested:false,},} Class:&ResourceClass{ObjectMeta:{gpu.nvidia.com    c570a929-e0d7-40ec-8a0d-4d67fddd16d7 546 0 2023-12-04 14:29:06 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:Helm] map[meta.helm.sh/release-name:nvidia meta.helm.sh/release-namespace:nvidia-dra-driver] [] [] [{helm Update resource.k8s.io/v1alpha2 2023-12-04 14:29:06 +0000 UTC FieldsV1 {"f:driverName":{},"f:metadata":{"f:annotations":{".":{},"f:meta.helm.sh/release-name":{},"f:meta.helm.sh/release-namespace":{}},"f:labels":{".":{},"f:app.kubernetes.io/managed-by":{}}}} }]},DriverName:gpu.resource.nvidia.com,ParametersRef:nil,SuitableNodes:nil,} ClaimParameters:0xc000b39b60 ClassParameters:0xc000442528 UnsuitableNodes:[]}] selectedNode="k8s-dra-driver-cluster-worker"
I1205 01:22:41.255220       1 controller.go:687] "resource controller: allocation for selected node" key="schedulingCtx:gpu-test1/job31-l7g9x" node="k8s-dra-driver-cluster-worker"
I1205 01:22:41.255227       1 controller.go:531] "resource controller: Claim already allocated, nothing to do" key="schedulingCtx:gpu-test1/job31-l7g9x"
I1205 01:22:41.255236       1 controller.go:342] "resource controller: recheck periodically" key="schedulingCtx:gpu-test1/job31-l7g9x"
I1205 01:22:44.236037       1 controller.go:269] "resource controller: Removing deleted work item" key="schedulingCtx:gpu-test1/job31-l7g9x"
I1205 01:23:11.249538       1 controller.go:332] "resource controller: processing" key="schedulingCtx:gpu-test1/job31-l7g9x"
I1205 01:23:11.249571       1 controller.go:377] "resource controller: PodSchedulingContext was deleted, no need to process it" key="schedulingCtx:gpu-test1/job31-l7g9x"
I1205 01:23:11.249578       1 controller.go:336] "resource controller: completed" key="schedulingCtx:gpu-test1/job31-l7g9x"
I1205 01:23:45.890768       1 controller.go:249] "resource controller: updated object" type="ResourceClaim" content="{\"metadata\":{\"name\":\"job31-l7g9x-mig2g\",\"namespace\":\"gpu-test1\",\"uid\":\"b7156f3e-7cff-4849-a04f-6f840f02198d\",\"resourceVersion\":\"56771\",\"creationTimestamp\":\"2023-12-05T01:22:39Z\",\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"name\":\"job31-l7g9x\",\"uid\":\"c000dfb4-fad5-4658-8497-d48db3d67d3d\",\"controller\":true,\"blockOwnerDeletion\":true}],\"finalizers\":[\"gpu.resource.nvidia.com/deletion-protection\"],\"managedFields\":[{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:39Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"c000dfb4-fad5-4658-8497-d48db3d67d3d\\\"}\":{}}},\"f:spec\":{\"f:allocationMode\":{},\"f:parametersRef\":{\".\":{},\"f:apiGroup\":{},\"f:kind\":{},\"f:name\":{}},\"f:resourceClassName\":{}}}},{\"manager\":\"nvidia-dra-controller\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:41Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:finalizers\":{\".\":{},\"v:\\\"gpu.resource.nvidia.com/deletion-protection\\\"\":{}}}}},{\"manager\":\"nvidia-dra-controller\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:41Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:status\":{\"f:allocation\":{\".\":{},\"f:availableOnNodes\":{},\"f:shareable\":{}},\"f:driverName\":{}}},\"subresource\":\"status\"}]},\"spec\":{\"resourceClassName\":\"gpu.nvidia.com\",\"parametersRef\":{\"apiGroup\":\"gpu.resource.nvidia.com\",\"kind\":\"MigDeviceClaimParameters\",\"name\":\"mig-1g.20gb\"},\"allocationMode\":\"WaitForFirstConsumer\"},\"status\":{\"driverName\":\"gpu.resource.nvidia.com\",\"allocation\":{\"availableOnNodes\":{\"nodeSelectorTerms\":[{\"matchFields\":[{\"key\":\"metadata.name\",\"operator\":\"In\",\"values\":[\"k8s-dra-driver-cluster-worker\"]}]}]},\"shareable\":true}}}" diff=<
                        OwnerReferences: {{APIVersion: "v1", Kind: "Pod", Name: "job31-l7g9x", UID: "c000dfb4-fad5-4658-8497-d48db3d67d3d", ...}},
        -                               Name:     "job31-l7g9x",
I1205 01:23:45.890792       1 controller.go:260] "resource controller: Adding updated work item" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:23:45.890837       1 controller.go:332] "resource controller: processing" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:23:45.890888       1 controller.go:425] "resource controller: ResourceClaim ready for deallocation" key="claim:gpu-test1/job31-l7g9x-mig2g" deallocationRequested=false deletionTimestamp="2023-12-05 01:23:45 +0000 UTC" allocated=true hasFinalizer=true
I1205 01:23:45.892827       1 controller.go:249] "resource controller: updated object" type="ResourceClaim" content="{\"metadata\":{\"name\":\"job31-l7g9x-mig2g\",\"namespace\":\"gpu-test1\",\"uid\":\"b7156f3e-7cff-4849-a04f-6f840f02198d\",\"resourceVersion\":\"56772\",\"creationTimestamp\":\"2023-12-05T01:22:39Z\",\"deletionTimestamp\":\"2023-12-05T01:23:45Z\",\"deletionGracePeriodSeconds\":0,\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"name\":\"job31-l7g9x\",\"uid\":\"c000dfb4-fad5-4658-8497-d48db3d67d3d\",\"controller\":true,\"blockOwnerDeletion\":true}],\"finalizers\":[\"gpu.resource.nvidia.com/deletion-protection\"],\"managedFields\":[{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:39Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"c000dfb4-fad5-4658-8497-d48db3d67d3d\\\"}\":{}}},\"f:spec\":{\"f:allocationMode\":{},\"f:parametersRef\":{\".\":{},\"f:apiGroup\":{},\"f:kind\":{},\"f:name\":{}},\"f:resourceClassName\":{}}}},{\"manager\":\"nvidia-dra-controller\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:41Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:finalizers\":{\".\":{},\"v:\\\"gpu.resource.nvidia.com/deletion-protection\\\"\":{}}}}},{\"manager\":\"nvidia-dra-controller\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:41Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:status\":{\"f:allocation\":{\".\":{},\"f:availableOnNodes\":{},\"f:shareable\":{}},\"f:driverName\":{}}},\"subresource\":\"status\"}]},\"spec\":{\"resourceClassName\":\"gpu.nvidia.com\",\"parametersRef\":{\"apiGroup\":\"gpu.resource.nvidia.com\",\"kind\":\"MigDeviceClaimParameters\",\"name\":\"mig-1g.20gb\"},\"allocationMode\":\"WaitForFirstConsumer\"},\"status\":{\"driverName\":\"gpu.resource.nvidia.com\",\"allocation\":{\"availableOnNodes\":{\"nodeSelectorTerms\":[{\"matchFields\":[{\"key\":\"metadata.name\",\"operator\":\"In\",\"values\":[\"k8s-dra-driver-cluster-worker\"]}]}]},\"shareable\":true}}}" diff=<
I1205 01:23:45.892843       1 controller.go:260] "resource controller: Adding updated work item" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:23:45.900179       1 round_trippers.go:553] PUT https://10.96.0.1:443/apis/resource.k8s.io/v1alpha2/namespaces/gpu-test1/resourceclaims/job31-l7g9x-mig2g/status 200 OK in 1 milliseconds
I1205 01:23:45.902098       1 round_trippers.go:553] PUT https://10.96.0.1:443/apis/resource.k8s.io/v1alpha2/namespaces/gpu-test1/resourceclaims/job31-l7g9x-mig2g 200 OK in 1 milliseconds
I1205 01:23:45.902213       1 controller.go:336] "resource controller: completed" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:23:45.902238       1 controller.go:332] "resource controller: processing" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:23:45.902247       1 controller.go:390] "resource controller: ResourceClaim not found, no need to process it" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:23:45.902253       1 controller.go:336] "resource controller: completed" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:23:45.902553       1 controller.go:249] "resource controller: updated object" type="ResourceClaim" content="{\"metadata\":{\"name\":\"job31-l7g9x-mig2g\",\"namespace\":\"gpu-test1\",\"uid\":\"b7156f3e-7cff-4849-a04f-6f840f02198d\",\"resourceVersion\":\"56774\",\"creationTimestamp\":\"2023-12-05T01:22:39Z\",\"deletionTimestamp\":\"2023-12-05T01:23:45Z\",\"deletionGracePeriodSeconds\":0,\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"name\":\"job31-l7g9x\",\"uid\":\"c000dfb4-fad5-4658-8497-d48db3d67d3d\",\"controller\":true,\"blockOwnerDeletion\":true}],\"finalizers\":[\"gpu.resource.nvidia.com/deletion-protection\"],\"managedFields\":[{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:39Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"c000dfb4-fad5-4658-8497-d48db3d67d3d\\\"}\":{}}},\"f:spec\":{\"f:allocationMode\":{},\"f:parametersRef\":{\".\":{},\"f:apiGroup\":{},\"f:kind\":{},\"f:name\":{}},\"f:resourceClassName\":{}}}},{\"manager\":\"nvidia-dra-controller\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:22:41Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:finalizers\":{\".\":{},\"v:\\\"gpu.resource.nvidia.com/deletion-protection\\\"\":{}}}}}]},\"spec\":{\"resourceClassName\":\"gpu.nvidia.com\",\"parametersRef\":{\"apiGroup\":\"gpu.resource.nvidia.com\",\"kind\":\"MigDeviceClaimParameters\",\"name\":\"mig-1g.20gb\"},\"allocationMode\":\"WaitForFirstConsumer\"},\"status\":{}}" diff=<
                        OwnerReferences: {{APIVersion: "v1", Kind: "Pod", Name: "job31-l7g9x", UID: "c000dfb4-fad5-4658-8497-d48db3d67d3d", ...}},
I1205 01:23:45.902572       1 controller.go:260] "resource controller: Adding updated work item" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:23:45.902589       1 controller.go:269] "resource controller: Removing deleted work item" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:23:45.902613       1 controller.go:332] "resource controller: processing" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:23:45.902647       1 controller.go:390] "resource controller: ResourceClaim not found, no need to process it" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:23:45.902656       1 controller.go:336] "resource controller: completed" key="claim:gpu-test1/job31-l7g9x-mig2g"
I1205 01:24:06.878536       1 controller.go:241] "resource controller: new object" type="ResourceClaim" content="{\"metadata\":{\"name\":\"job31-wkwpn-mig1g\",\"namespace\":\"gpu-test1\",\"uid\":\"148dc683-2f43-4b2f-a11e-2d49477cf6d6\",\"resourceVersion\":\"56811\",\"creationTimestamp\":\"2023-12-05T01:24:06Z\",\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"name\":\"job31-wkwpn\",\"uid\":\"5970b1a4-9ea3-4129-9083-875db84fccec\",\"controller\":true,\"blockOwnerDeletion\":true}],\"managedFields\":[{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:24:06Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"5970b1a4-9ea3-4129-9083-875db84fccec\\\"}\":{}}},\"f:spec\":{\"f:allocationMode\":{},\"f:parametersRef\":{\".\":{},\"f:apiGroup\":{},\"f:kind\":{},\"f:name\":{}},\"f:resourceClassName\":{}}}}]},\"spec\":{\"resourceClassName\":\"gpu.nvidia.com\",\"parametersRef\":{\"apiGroup\":\"gpu.resource.nvidia.com\",\"kind\":\"MigDeviceClaimParameters\",\"name\":\"mig-1g.20gb\"},\"allocationMode\":\"WaitForFirstConsumer\"},\"status\":{}}"
I1205 01:24:06.878560       1 controller.go:260] "resource controller: Adding new work item" key="claim:gpu-test1/job31-wkwpn-mig1g"
I1205 01:24:06.878584       1 controller.go:332] "resource controller: processing" key="claim:gpu-test1/job31-wkwpn-mig1g"
I1205 01:24:06.878595       1 controller.go:476] "resource controller: ResourceClaim waiting for first consumer" key="claim:gpu-test1/job31-wkwpn-mig1g"
I1205 01:24:06.879189       1 controller.go:336] "resource controller: completed" key="claim:gpu-test1/job31-wkwpn-mig1g"
I1205 01:24:08.289684       1 controller.go:241] "resource controller: new object" type="PodSchedulingContext" content="{\"metadata\":{\"name\":\"job31-wkwpn\",\"namespace\":\"gpu-test1\",\"uid\":\"1346ea69-e3d3-46fe-a433-68defe75b21f\",\"resourceVersion\":\"56816\",\"creationTimestamp\":\"2023-12-05T01:24:08Z\",\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"name\":\"job31-wkwpn\",\"uid\":\"5970b1a4-9ea3-4129-9083-875db84fccec\",\"controller\":true}],\"managedFields\":[{\"manager\":\"kube-scheduler\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:24:08Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"5970b1a4-9ea3-4129-9083-875db84fccec\\\"}\":{}}},\"f:spec\":{\"f:potentialNodes\":{\".\":{},\"v:\\\"k8s-dra-driver-cluster-worker\\\"\":{}},\"f:selectedNode\":{}}}}]},\"spec\":{\"selectedNode\":\"k8s-dra-driver-cluster-worker\",\"potentialNodes\":[\"k8s-dra-driver-cluster-worker\"]},\"status\":{}}"
I1205 01:24:08.289701       1 controller.go:260] "resource controller: Adding new work item" key="schedulingCtx:gpu-test1/job31-wkwpn"
I1205 01:24:08.289719       1 controller.go:332] "resource controller: processing" key="schedulingCtx:gpu-test1/job31-wkwpn"
I1205 01:24:08.291338       1 round_trippers.go:553] GET https://10.96.0.1:443/api/v1/namespaces/gpu-test1/pods/job31-wkwpn 200 OK in 1 milliseconds
I1205 01:24:08.295028       1 controller.go:674] "resource controller: pending pod claims" key="schedulingCtx:gpu-test1/job31-wkwpn" claims=[{PodClaimName:mig1g Claim:&ResourceClaim{ObjectMeta:{job31-wkwpn-mig1g  gpu-test1  148dc683-2f43-4b2f-a11e-2d49477cf6d6 56811 0 2023-12-05 01:24:06 +0000 UTC <nil> <nil> map[] map[] [{v1 Pod job31-wkwpn 5970b1a4-9ea3-4129-9083-875db84fccec 0xc0007a854e 0xc0007a854f}] [] [{kube-controller-manager Update resource.k8s.io/v1alpha2 2023-12-05 01:24:06 +0000 UTC FieldsV1 {"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"5970b1a4-9ea3-4129-9083-875db84fccec\"}":{}}},"f:spec":{"f:allocationMode":{},"f:parametersRef":{".":{},"f:apiGroup":{},"f:kind":{},"f:name":{}},"f:resourceClassName":{}}} }]},Spec:ResourceClaimSpec{ResourceClassName:gpu.nvidia.com,ParametersRef:&ResourceClaimParametersReference{APIGroup:gpu.resource.nvidia.com,Kind:MigDeviceClaimParameters,Name:mig-1g.20gb,},AllocationMode:WaitForFirstConsumer,},Status:ResourceClaimStatus{DriverName:,Allocation:nil,ReservedFor:[]ResourceClaimConsumerReference{},DeallocationRequested:false,},} Class:&ResourceClass{ObjectMeta:{gpu.nvidia.com    c570a929-e0d7-40ec-8a0d-4d67fddd16d7 546 0 2023-12-04 14:29:06 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:Helm] map[meta.helm.sh/release-name:nvidia meta.helm.sh/release-namespace:nvidia-dra-driver] [] [] [{helm Update resource.k8s.io/v1alpha2 2023-12-04 14:29:06 +0000 UTC FieldsV1 {"f:driverName":{},"f:metadata":{"f:annotations":{".":{},"f:meta.helm.sh/release-name":{},"f:meta.helm.sh/release-namespace":{}},"f:labels":{".":{},"f:app.kubernetes.io/managed-by":{}}}} }]},DriverName:gpu.resource.nvidia.com,ParametersRef:nil,SuitableNodes:nil,} ClaimParameters:0xc0003686f0 ClassParameters:0xc000532008 UnsuitableNodes:[]}] selectedNode="k8s-dra-driver-cluster-worker"
I1205 01:24:08.295045       1 controller.go:687] "resource controller: allocation for selected node" key="schedulingCtx:gpu-test1/job31-wkwpn" node="k8s-dra-driver-cluster-worker"
I1205 01:24:08.295055       1 controller.go:538] "resource controller: Adding finalizer" key="schedulingCtx:gpu-test1/job31-wkwpn"
I1205 01:24:08.296572       1 round_trippers.go:553] PUT https://10.96.0.1:443/apis/resource.k8s.io/v1alpha2/namespaces/gpu-test1/resourceclaims/job31-wkwpn-mig1g 200 OK in 1 milliseconds
I1205 01:24:08.296703       1 controller.go:548] "resource controller: Allocating" key="schedulingCtx:gpu-test1/job31-wkwpn"
I1205 01:24:08.298415       1 controller.go:249] "resource controller: updated object" type="ResourceClaim" content="{\"metadata\":{\"name\":\"job31-wkwpn-mig1g\",\"namespace\":\"gpu-test1\",\"uid\":\"148dc683-2f43-4b2f-a11e-2d49477cf6d6\",\"resourceVersion\":\"56819\",\"creationTimestamp\":\"2023-12-05T01:24:06Z\",\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"name\":\"job31-wkwpn\",\"uid\":\"5970b1a4-9ea3-4129-9083-875db84fccec\",\"controller\":true,\"blockOwnerDeletion\":true}],\"finalizers\":[\"gpu.resource.nvidia.com/deletion-protection\"],\"managedFields\":[{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:24:06Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"5970b1a4-9ea3-4129-9083-875db84fccec\\\"}\":{}}},\"f:spec\":{\"f:allocationMode\":{},\"f:parametersRef\":{\".\":{},\"f:apiGroup\":{},\"f:kind\":{},\"f:name\":{}},\"f:resourceClassName\":{}}}},{\"manager\":\"nvidia-dra-controller\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:24:08Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:finalizers\":{\".\":{},\"v:\\\"gpu.resource.nvidia.com/deletion-protection\\\"\":{}}}}}]},\"spec\":{\"resourceClassName\":\"gpu.nvidia.com\",\"parametersRef\":{\"apiGroup\":\"gpu.resource.nvidia.com\",\"kind\":\"MigDeviceClaimParameters\",\"name\":\"mig-1g.20gb\"},\"allocationMode\":\"WaitForFirstConsumer\"},\"status\":{}}" diff=<
                        OwnerReferences: {{APIVersion: "v1", Kind: "Pod", Name: "job31-wkwpn", UID: "5970b1a4-9ea3-4129-9083-875db84fccec", ...}},
I1205 01:24:08.298431       1 controller.go:260] "resource controller: Adding updated work item" key="claim:gpu-test1/job31-wkwpn-mig1g"
I1205 01:24:08.298447       1 controller.go:332] "resource controller: processing" key="claim:gpu-test1/job31-wkwpn-mig1g"
I1205 01:24:08.298457       1 controller.go:476] "resource controller: ResourceClaim waiting for first consumer" key="claim:gpu-test1/job31-wkwpn-mig1g"
I1205 01:24:08.298464       1 controller.go:336] "resource controller: completed" key="claim:gpu-test1/job31-wkwpn-mig1g"
I1205 01:24:08.303709       1 controller.go:558] "resource controller: Updating claim after allocation" key="schedulingCtx:gpu-test1/job31-wkwpn" claim="&ResourceClaim{ObjectMeta:{job31-wkwpn-mig1g  gpu-test1  148dc683-2f43-4b2f-a11e-2d49477cf6d6 56819 0 2023-12-05 01:24:06 +0000 UTC <nil> <nil> map[] map[] [{v1 Pod job31-wkwpn 5970b1a4-9ea3-4129-9083-875db84fccec 0xc00069e42e 0xc00069e42f}] [gpu.resource.nvidia.com/deletion-protection] [{kube-controller-manager Update resource.k8s.io/v1alpha2 2023-12-05 01:24:06 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"5970b1a4-9ea3-4129-9083-875db84fccec\\\"}\":{}}},\"f:spec\":{\"f:allocationMode\":{},\"f:parametersRef\":{\".\":{},\"f:apiGroup\":{},\"f:kind\":{},\"f:name\":{}},\"f:resourceClassName\":{}}} } {nvidia-dra-controller Update resource.k8s.io/v1alpha2 2023-12-05 01:24:08 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:finalizers\":{\".\":{},\"v:\\\"gpu.resource.nvidia.com/deletion-protection\\\"\":{}}}} }]},Spec:ResourceClaimSpec{ResourceClassName:gpu.nvidia.com,ParametersRef:&ResourceClaimParametersReference{APIGroup:gpu.resource.nvidia.com,Kind:MigDeviceClaimParameters,Name:mig-1g.20gb,},AllocationMode:WaitForFirstConsumer,},Status:ResourceClaimStatus{DriverName:gpu.resource.nvidia.com,Allocation:&AllocationResult{ResourceHandles:[]ResourceHandle{},AvailableOnNodes:&v1.NodeSelector{NodeSelectorTerms:[]NodeSelectorTerm{NodeSelectorTerm{MatchExpressions:[]NodeSelectorRequirement{},MatchFields:[]NodeSelectorRequirement{NodeSelectorRequirement{Key:metadata.name,Operator:In,Values:[k8s-dra-driver-cluster-worker],},},},},},Shareable:true,},ReservedFor:[]ResourceClaimConsumerReference{ResourceClaimConsumerReference{APIGroup:,Resource:pods,Name:job31-wkwpn,UID:5970b1a4-9ea3-4129-9083-875db84fccec,},},DeallocationRequested:false,},}"
I1205 01:24:08.306960       1 round_trippers.go:553] PUT https://10.96.0.1:443/apis/resource.k8s.io/v1alpha2/namespaces/gpu-test1/resourceclaims/job31-wkwpn-mig1g/status 200 OK in 3 milliseconds
I1205 01:24:08.307135       1 controller.go:724] "resource controller: Updating pod scheduling with modified unsuitable nodes" key="schedulingCtx:gpu-test1/job31-wkwpn" podSchedulingCtx="&PodSchedulingContext{ObjectMeta:{job31-wkwpn  gpu-test1  1346ea69-e3d3-46fe-a433-68defe75b21f 56816 0 2023-12-05 01:24:08 +0000 UTC <nil> <nil> map[] map[] [{v1 Pod job31-wkwpn 5970b1a4-9ea3-4129-9083-875db84fccec 0xc00069e66b <nil>}] [] [{kube-scheduler Update resource.k8s.io/v1alpha2 2023-12-05 01:24:08 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"5970b1a4-9ea3-4129-9083-875db84fccec\\\"}\":{}}},\"f:spec\":{\"f:potentialNodes\":{\".\":{},\"v:\\\"k8s-dra-driver-cluster-worker\\\"\":{}},\"f:selectedNode\":{}}} }]},Spec:PodSchedulingContextSpec{SelectedNode:k8s-dra-driver-cluster-worker,PotentialNodes:[k8s-dra-driver-cluster-worker],},Status:PodSchedulingContextStatus{ResourceClaims:[]ResourceClaimSchedulingStatus{ResourceClaimSchedulingStatus{Name:mig1g,UnsuitableNodes:[],},},},}"
I1205 01:24:08.308832       1 round_trippers.go:553] PUT https://10.96.0.1:443/apis/resource.k8s.io/v1alpha2/namespaces/gpu-test1/podschedulingcontexts/job31-wkwpn/status 200 OK in 1 milliseconds
I1205 01:24:08.308974       1 controller.go:342] "resource controller: recheck periodically" key="schedulingCtx:gpu-test1/job31-wkwpn"
I1205 01:24:08.309355       1 controller.go:249] "resource controller: updated object" type="ResourceClaim" content="{\"metadata\":{\"name\":\"job31-wkwpn-mig1g\",\"namespace\":\"gpu-test1\",\"uid\":\"148dc683-2f43-4b2f-a11e-2d49477cf6d6\",\"resourceVersion\":\"56821\",\"creationTimestamp\":\"2023-12-05T01:24:06Z\",\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"name\":\"job31-wkwpn\",\"uid\":\"5970b1a4-9ea3-4129-9083-875db84fccec\",\"controller\":true,\"blockOwnerDeletion\":true}],\"finalizers\":[\"gpu.resource.nvidia.com/deletion-protection\"],\"managedFields\":[{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:24:06Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"5970b1a4-9ea3-4129-9083-875db84fccec\\\"}\":{}}},\"f:spec\":{\"f:allocationMode\":{},\"f:parametersRef\":{\".\":{},\"f:apiGroup\":{},\"f:kind\":{},\"f:name\":{}},\"f:resourceClassName\":{}}}},{\"manager\":\"nvidia-dra-controller\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:24:08Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:finalizers\":{\".\":{},\"v:\\\"gpu.resource.nvidia.com/deletion-protection\\\"\":{}}}}},{\"manager\":\"nvidia-dra-controller\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:24:08Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:status\":{\"f:allocation\":{\".\":{},\"f:availableOnNodes\":{},\"f:shareable\":{}},\"f:driverName\":{},\"f:reservedFor\":{\".\":{},\"k:{\\\"uid\\\":\\\"5970b1a4-9ea3-4129-9083-875db84fccec\\\"}\":{\".\":{},\"f:name\":{},\"f:resource\":{},\"f:uid\":{}}}}},\"subresource\":\"status\"}]},\"spec\":{\"resourceClassName\":\"gpu.nvidia.com\",\"parametersRef\":{\"apiGroup\":\"gpu.resource.nvidia.com\",\"kind\":\"MigDeviceClaimParameters\",\"name\":\"mig-1g.20gb\"},\"allocationMode\":\"WaitForFirstConsumer\"},\"status\":{\"driverName\":\"gpu.resource.nvidia.com\",\"allocation\":{\"availableOnNodes\":{\"nodeSelectorTerms\":[{\"matchFields\":[{\"key\":\"metadata.name\",\"operator\":\"In\",\"values\":[\"k8s-dra-driver-cluster-worker\"]}]}]},\"shareable\":true},\"reservedFor\":[{\"resource\":\"pods\",\"name\":\"job31-wkwpn\",\"uid\":\"5970b1a4-9ea3-4129-9083-875db84fccec\"}]}}" diff=<
                        OwnerReferences: {{APIVersion: "v1", Kind: "Pod", Name: "job31-wkwpn", UID: "5970b1a4-9ea3-4129-9083-875db84fccec", ...}},
        +                               Name:     "job31-wkwpn",
I1205 01:24:08.309373       1 controller.go:260] "resource controller: Adding updated work item" key="claim:gpu-test1/job31-wkwpn-mig1g"
I1205 01:24:08.309403       1 controller.go:332] "resource controller: processing" key="claim:gpu-test1/job31-wkwpn-mig1g"
I1205 01:24:08.309430       1 controller.go:412] "resource controller: ResourceClaim in use" key="claim:gpu-test1/job31-wkwpn-mig1g" reservedFor=[{APIGroup: Resource:pods Name:job31-wkwpn UID:5970b1a4-9ea3-4129-9083-875db84fccec}]
I1205 01:24:08.309438       1 controller.go:336] "resource controller: completed" key="claim:gpu-test1/job31-wkwpn-mig1g"
I1205 01:24:08.310492       1 controller.go:249] "resource controller: updated object" type="PodSchedulingContext" content="{\"metadata\":{\"name\":\"job31-wkwpn\",\"namespace\":\"gpu-test1\",\"uid\":\"1346ea69-e3d3-46fe-a433-68defe75b21f\",\"resourceVersion\":\"56822\",\"creationTimestamp\":\"2023-12-05T01:24:08Z\",\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"name\":\"job31-wkwpn\",\"uid\":\"5970b1a4-9ea3-4129-9083-875db84fccec\",\"controller\":true}],\"managedFields\":[{\"manager\":\"kube-scheduler\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:24:08Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"5970b1a4-9ea3-4129-9083-875db84fccec\\\"}\":{}}},\"f:spec\":{\"f:potentialNodes\":{\".\":{},\"v:\\\"k8s-dra-driver-cluster-worker\\\"\":{}},\"f:selectedNode\":{}}}},{\"manager\":\"nvidia-dra-controller\",\"operation\":\"Update\",\"apiVersion\":\"resource.k8s.io/v1alpha2\",\"time\":\"2023-12-05T01:24:08Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:status\":{\"f:resourceClaims\":{\".\":{},\"k:{\\\"name\\\":\\\"mig1g\\\"}\":{\".\":{},\"f:name\":{}}}}},\"subresource\":\"status\"}]},\"spec\":{\"selectedNode\":\"k8s-dra-driver-cluster-worker\",\"potentialNodes\":[\"k8s-dra-driver-cluster-worker\"]},\"status\":{\"resourceClaims\":[{\"name\":\"mig1g\"}]}}" diff=<
                        OwnerReferences: {{APIVersion: "v1", Kind: "Pod", Name: "job31-wkwpn", UID: "5970b1a4-9ea3-4129-9083-875db84fccec", ...}},
I1205 01:24:08.310508       1 controller.go:260] "resource controller: Adding updated work item" key="schedulingCtx:gpu-test1/job31-wkwpn"
I1205 01:24:08.310522       1 controller.go:332] "resource controller: processing" key="schedulingCtx:gpu-test1/job31-wkwpn"
I1205 01:24:08.311492       1 round_trippers.go:553] GET https://10.96.0.1:443/api/v1/namespaces/gpu-test1/pods/job31-wkwpn 200 OK in 0 milliseconds
I1205 01:24:08.315019       1 controller.go:674] "resource controller: pending pod claims" key="schedulingCtx:gpu-test1/job31-wkwpn" claims=[{PodClaimName:mig1g Claim:&ResourceClaim{ObjectMeta:{job31-wkwpn-mig1g  gpu-test1  148dc683-2f43-4b2f-a11e-2d49477cf6d6 56821 0 2023-12-05 01:24:06 +0000 UTC <nil> <nil> map[] map[] [{v1 Pod job31-wkwpn 5970b1a4-9ea3-4129-9083-875db84fccec 0xc00068e87e 0xc00068e87f}] [gpu.resource.nvidia.com/deletion-protection] [{kube-controller-manager Update resource.k8s.io/v1alpha2 2023-12-05 01:24:06 +0000 UTC FieldsV1 {"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"5970b1a4-9ea3-4129-9083-875db84fccec\"}":{}}},"f:spec":{"f:allocationMode":{},"f:parametersRef":{".":{},"f:apiGroup":{},"f:kind":{},"f:name":{}},"f:resourceClassName":{}}} } {nvidia-dra-controller Update resource.k8s.io/v1alpha2 2023-12-05 01:24:08 +0000 UTC FieldsV1 {"f:metadata":{"f:finalizers":{".":{},"v:\"gpu.resource.nvidia.com/deletion-protection\"":{}}}} } {nvidia-dra-controller Update resource.k8s.io/v1alpha2 2023-12-05 01:24:08 +0000 UTC FieldsV1 {"f:status":{"f:allocation":{".":{},"f:availableOnNodes":{},"f:shareable":{}},"f:driverName":{},"f:reservedFor":{".":{},"k:{\"uid\":\"5970b1a4-9ea3-4129-9083-875db84fccec\"}":{".":{},"f:name":{},"f:resource":{},"f:uid":{}}}}} status}]},Spec:ResourceClaimSpec{ResourceClassName:gpu.nvidia.com,ParametersRef:&ResourceClaimParametersReference{APIGroup:gpu.resource.nvidia.com,Kind:MigDeviceClaimParameters,Name:mig-1g.20gb,},AllocationMode:WaitForFirstConsumer,},Status:ResourceClaimStatus{DriverName:gpu.resource.nvidia.com,Allocation:&AllocationResult{ResourceHandles:[]ResourceHandle{},AvailableOnNodes:&v1.NodeSelector{NodeSelectorTerms:[]NodeSelectorTerm{NodeSelectorTerm{MatchExpressions:[]NodeSelectorRequirement{},MatchFields:[]NodeSelectorRequirement{NodeSelectorRequirement{Key:metadata.name,Operator:In,Values:[k8s-dra-driver-cluster-worker],},},},},},Shareable:true,},ReservedFor:[]ResourceClaimConsumerReference{ResourceClaimConsumerReference{APIGroup:,Resource:pods,Name:job31-wkwpn,UID:5970b1a4-9ea3-4129-9083-875db84fccec,},},DeallocationRequested:false,},} Class:&ResourceClass{ObjectMeta:{gpu.nvidia.com    c570a929-e0d7-40ec-8a0d-4d67fddd16d7 546 0 2023-12-04 14:29:06 +0000 UTC <nil> <nil> map[app.kubernetes.io/managed-by:Helm] map[meta.helm.sh/release-name:nvidia meta.helm.sh/release-namespace:nvidia-dra-driver] [] [] [{helm Update resource.k8s.io/v1alpha2 2023-12-04 14:29:06 +0000 UTC FieldsV1 {"f:driverName":{},"f:metadata":{"f:annotations":{".":{},"f:meta.helm.sh/release-name":{},"f:meta.helm.sh/release-namespace":{}},"f:labels":{".":{},"f:app.kubernetes.io/managed-by":{}}}} }]},DriverName:gpu.resource.nvidia.com,ParametersRef:nil,SuitableNodes:nil,} ClaimParameters:0xc000818d20 ClassParameters:0xc0005b2008 UnsuitableNodes:[]}] selectedNode="k8s-dra-driver-cluster-worker"
I1205 01:24:08.315034       1 controller.go:687] "resource controller: allocation for selected node" key="schedulingCtx:gpu-test1/job31-wkwpn" node="k8s-dra-driver-cluster-worker"
I1205 01:24:08.315041       1 controller.go:531] "resource controller: Claim already allocated, nothing to do" key="schedulingCtx:gpu-test1/job31-wkwpn"
I1205 01:24:08.315050       1 controller.go:342] "resource controller: recheck periodically" key="schedulingCtx:gpu-test1/job31-wkwpn"
I1205 01:24:11.293819       1 controller.go:269] "resource controller: Removing deleted work item" key="schedulingCtx:gpu-test1/job31-wkwpn"
I1205 01:24:38.310063       1 controller.go:332] "resource controller: processing" key="schedulingCtx:gpu-test1/job31-wkwpn"
I1205 01:24:38.310091       1 controller.go:377] "resource controller: PodSchedulingContext was deleted, no need to process it" key="schedulingCtx:gpu-test1/job31-wkwpn"
I1205 01:24:38.310098       1 controller.go:336] "resource controller: completed" key="schedulingCtx:gpu-test1/job31-wkwpn"

it says podschedulingcontext was deleted!

klueska commented 9 months ago

Those are not the logs for the kubelet plugin, those are the logs of controller.

asm582 commented 9 months ago

ok sure, here are the logs of Kubelet plugin:

I1205 20:27:39.166472       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:27:39.194697       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=944 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:28:48.167360       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:28:48.194885       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=945 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:30:14.167501       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:30:14.194938       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=946 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:31:20.166614       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:31:20.194932       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=947 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:32:43.167808       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:32:43.196700       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=948 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:33:51.166639       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:33:51.194996       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=949 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:35:16.166478       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:35:16.193938       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=950 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:36:17.167157       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:36:17.194972       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=951 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:37:33.167084       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:37:33.195060       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=952 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:38:36.167319       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:38:36.196029       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=953 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:39:40.166855       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:39:40.195798       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=954 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:41:05.166416       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:41:05.193982       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=955 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:42:25.167132       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:42:25.196033       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=956 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:43:36.166850       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:43:36.195637       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=957 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:44:53.167710       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:44:53.195786       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=958 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:46:17.167010       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:46:17.196072       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=959 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:47:31.167338       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:47:31.196794       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=960 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:48:47.167807       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:48:47.197649       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=961 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:50:17.167499       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:50:17.197145       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=962 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:51:35.167712       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:51:35.196603       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=963 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:52:41.167953       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:52:41.197666       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=964 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:53:53.167178       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:53:53.195097       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=965 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:55:21.166857       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:55:21.195224       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=966 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:56:25.167288       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:56:25.195047       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=967 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:57:40.167090       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:57:40.196755       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=968 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:58:45.167487       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:58:45.196904       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=969 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 20:59:57.167320       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 20:59:57.195089       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=970 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:01:21.167196       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:01:21.194969       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=971 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:02:24.166519       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:02:24.194036       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=972 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:03:45.167175       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:03:45.195047       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=973 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:05:01.167242       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:05:01.195021       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=974 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:06:12.166815       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:06:12.195808       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=975 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:07:20.166768       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:07:20.195718       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=976 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:08:40.166686       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:08:40.195005       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=977 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:09:50.167257       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:09:50.195746       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=978 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:11:18.167126       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:11:18.196671       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=979 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:12:21.166868       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:12:21.195980       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=980 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:13:25.167244       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:13:25.196180       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=981 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:14:27.167180       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:14:27.195697       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=982 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:15:41.166863       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:15:41.196697       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=983 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:16:47.166726       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:16:47.194043       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=984 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:17:55.167144       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:17:55.195692       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=985 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:19:15.166794       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:19:15.193951       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=986 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:20:17.167602       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:20:17.194932       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=987 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:21:24.167300       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:21:24.195016       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=988 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:22:46.166991       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:22:46.195683       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=989 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:23:52.166590       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:23:52.194614       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=990 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:25:22.166770       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:25:22.195703       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=991 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:26:33.167230       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:26:33.196013       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=992 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:27:42.167040       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:27:42.195047       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=993 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:28:45.167053       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:28:45.197202       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=994 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:29:58.167128       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:29:58.195985       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=995 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:31:28.166263       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:31:28.194686       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=996 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:32:48.166356       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:32:48.195809       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=997 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:34:14.167137       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:34:14.195679       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=998 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:35:15.166923       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:35:15.196260       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=999 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:36:28.167042       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:36:28.196733       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1000 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:37:49.167043       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:37:49.195043       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1001 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:39:03.166931       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:39:03.194969       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1002 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:40:30.167446       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:40:30.195899       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1003 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:41:57.168256       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:41:57.195994       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1004 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:43:18.167037       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:43:18.195022       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1005 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:44:40.167232       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:44:40.195020       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1006 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:46:09.166954       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:46:09.195004       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1007 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:47:25.167370       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:47:25.195775       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1008 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:48:55.167876       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:48:55.197089       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1009 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:50:14.166996       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:50:14.196002       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1010 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:51:37.167805       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:51:37.196022       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1011 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:53:07.166803       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:53:07.195716       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1012 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:54:35.167263       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:54:35.194917       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1013 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:55:45.168767       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:55:45.197652       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1014 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:56:47.166522       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:56:47.195711       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1015 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:58:04.167536       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:58:04.195688       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1016 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 21:59:16.166895       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 21:59:16.195684       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1017 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:00:23.166976       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:00:23.196061       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1018 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:01:32.166718       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:01:32.193943       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1019 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:02:51.166857       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:02:51.195801       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1020 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:04:21.166841       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:04:21.196707       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1021 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:05:46.166771       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:05:46.195757       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1022 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:07:16.167166       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:07:16.195720       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1023 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:08:36.166643       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:08:36.194712       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1024 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:09:54.166812       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:09:54.194988       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1025 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:11:14.166857       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:11:14.194884       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1026 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:12:18.167111       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:12:18.194894       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1027 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:13:28.167313       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:13:28.194987       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1028 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:14:52.166587       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:14:52.194691       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1029 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:16:19.167628       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:16:19.196080       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1030 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:17:44.168231       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:17:44.197058       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1031 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:19:04.166572       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:19:04.194962       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1032 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:20:32.167063       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:20:32.195680       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1033 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:21:56.166793       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:21:56.196651       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1034 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:23:16.167385       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:23:16.195973       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1035 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:24:40.167112       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:24:40.195724       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1036 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:26:09.166772       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:26:09.195066       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1037 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:27:23.166634       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:27:23.195685       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1038 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:28:39.167372       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:28:39.194934       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1039 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:29:57.166614       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:29:57.193935       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1040 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:31:21.167011       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:31:21.195923       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1041 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:32:39.167294       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:32:39.195031       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1042 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:33:48.166709       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:33:48.194976       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1043 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:35:12.166651       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:35:12.195749       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1044 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:36:36.166583       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:36:36.195911       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1045 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:38:04.166506       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:38:04.194693       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1046 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:39:25.167394       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:39:25.196015       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1047 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:40:41.167376       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:40:41.196150       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1048 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:41:57.167160       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:41:57.195745       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1049 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:43:22.167896       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:43:22.196711       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1050 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
I1205 22:44:46.167229       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}
E1205 22:44:46.196644       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim 148dc683-2f43-4b2f-a11e-2d49477cf6d6: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.20gb': Insufficient Resources" requestID=1051 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:148dc683-2f43-4b2f-a11e-2d49477cf6d6,ClaimName:job31-wkwpn-mig1g,ResourceHandle:,}"
asm582 commented 9 months ago

Below is the resourceclaim status:

[root@nvd-srv-02 ~]# kubectl get resourceclaims --all-namespaces
NAMESPACE   NAME                RESOURCECLASSNAME   ALLOCATIONMODE         STATE                AGE
gpu-test1   job1-6n8x2-mig2g    gpu.nvidia.com      WaitForFirstConsumer   allocated,reserved   21h
gpu-test1   job2-wc67x-mig2g    gpu.nvidia.com      WaitForFirstConsumer   allocated,reserved   21h
gpu-test1   job31-wkwpn-mig1g   gpu.nvidia.com      WaitForFirstConsumer   allocated,reserved   21h
klueska commented 9 months ago

Something's not right. You said you had claims for two 1g.10gb devices and one 2g.10gb device, but the plugin is trying to allocate a 1g.20gb device. Can you double check your claim parameters?

asm582 commented 9 months ago

ok, there was an issue with 3rd GPU partition. now I am creating 1g.10gb partition and I have already created two 2g.20gb but still it is unable to create the partition:

I1206 02:08:39.166997       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:bb16a804-9e32-4124-8dd6-315bdaea5aba,ClaimName:job31-rmntb-mig1g,ResourceHandle:,}
E1206 02:08:39.194961       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim bb16a804-9e32-4124-8dd6-315bdaea5aba: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.10gb': Insufficient Resources" requestID=1211 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:bb16a804-9e32-4124-8dd6-315bdaea5aba,ClaimName:job31-rmntb-mig1g,ResourceHandle:,}"
I1206 02:09:47.167084       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:bb16a804-9e32-4124-8dd6-315bdaea5aba,ClaimName:job31-rmntb-mig1g,ResourceHandle:,}
E1206 02:09:47.196020       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim bb16a804-9e32-4124-8dd6-315bdaea5aba: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.10gb': Insufficient Resources" requestID=1212 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:bb16a804-9e32-4124-8dd6-315bdaea5aba,ClaimName:job31-rmntb-mig1g,ResourceHandle:,}"
I1206 02:10:50.167189       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:bb16a804-9e32-4124-8dd6-315bdaea5aba,ClaimName:job31-rmntb-mig1g,ResourceHandle:,}
E1206 02:10:50.194991       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim bb16a804-9e32-4124-8dd6-315bdaea5aba: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.10gb': Insufficient Resources" requestID=1213 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:bb16a804-9e32-4124-8dd6-315bdaea5aba,ClaimName:job31-rmntb-mig1g,ResourceHandle:,}"
I1206 02:11:56.167092       1 driver.go:107] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:bb16a804-9e32-4124-8dd6-315bdaea5aba,ClaimName:job31-rmntb-mig1g,ResourceHandle:,}
E1206 02:11:56.195007       1 nonblockinggrpcserver.go:127] "dra: handling request failed" err="error preparing devices for claim bb16a804-9e32-4124-8dd6-315bdaea5aba: MIG device allocation failed: error creating MIG device: error creating GPU instance for '1g.10gb': Insufficient Resources" requestID=1214 request="&NodePrepareResourceRequest{Namespace:gpu-test1,ClaimUid:bb16a804-9e32-4124-8dd6-315bdaea5aba,ClaimName:job31-rmntb-mig1g,ResourceHandle:,}"

below is the snippet of nas object:

  Allocated Claims:
    05012d63-f2eb-4a73-be38-b3938d9aa891:
      Claim Info:
        Name:       job2-wc67x-mig2g
        Namespace:  gpu-test1
        UID:        05012d63-f2eb-4a73-be38-b3938d9aa891
      Mig:
        Devices:
          Parent UUID:  GPU-1a9afbae-5932-54f8-c2c4-a863888d45bb
          Placement:
            Size:   2
            Start:  2
          Profile:  2g.20gb
    73c4108d-e016-4e79-b77e-52ca25f050ec:
      Claim Info:
        Name:       job1-6n8x2-mig2g
        Namespace:  gpu-test1
        UID:        73c4108d-e016-4e79-b77e-52ca25f050ec
      Mig:
        Devices:
          Parent UUID:  GPU-1a9afbae-5932-54f8-c2c4-a863888d45bb
          Placement:
            Size:   2
            Start:  0
          Profile:  2g.20gb
    bb16a804-9e32-4124-8dd6-315bdaea5aba:
      Claim Info:
        Name:       job31-rmntb-mig1g
        Namespace:  gpu-test1
        UID:        bb16a804-9e32-4124-8dd6-315bdaea5aba
      Mig:
        Devices:
          Parent UUID:  GPU-1a9afbae-5932-54f8-c2c4-a863888d45bb
          Placement:
            Size:   1
            Start:  4
          Profile:  1g.10gb
  Prepared Claims:
    05012d63-f2eb-4a73-be38-b3938d9aa891:
      Mig:
        Devices:
          Parent UUID:  GPU-1a9afbae-5932-54f8-c2c4-a863888d45bb
          Placement:
            Size:   2
            Start:  2
          Profile:  2g.20gb
          Uuid:     MIG-896dd9e6-84f9-57e8-9426-f298047b914c
    73c4108d-e016-4e79-b77e-52ca25f050ec:
      Mig:
        Devices:
          Parent UUID:  GPU-1a9afbae-5932-54f8-c2c4-a863888d45bb
          Placement:
            Size:   2
            Start:  0
          Profile:  2g.20gb
          Uuid:     MIG-f4477f87-90ef-5540-b4ab-78c9286ea812
Status:             Ready
Events:             <none>
asm582 commented 9 months ago

I deleted the old kind cluster, created a fresh one, and redid the experiments. All jobs ran successfully, thanks for your help but I am not sure about the root cause. Please feel free to close this issue.

CoderTH commented 9 months ago

image I have a similar problem.

Name:         172.19.50.56
Namespace:    nvidia-dra-driver
Labels:       <none>
Annotations:  <none>
API Version:  nas.gpu.resource.nvidia.com/v1alpha1
Kind:         NodeAllocationState
Metadata:
  Creation Timestamp:  2023-12-10T06:09:05Z
  Generation:          12
  Owner References:
    API Version:     v1
    Kind:            Node
    Name:            172.19.50.56
    UID:             c5392d53-d377-4247-b00c-093787aaeb62
  Resource Version:  4401
  UID:               416cb1b1-59f4-4445-9408-37ad49215910
Spec:
  Allocatable Devices:
    Gpu:
      Architecture:             Ampere
      Brand:                    Nvidia
      Cuda Compute Capability:  8.0
      Index:                    0
      Memory Bytes:             85899345920
      Mig Enabled:              true
      Product Name:             NVIDIA A100-SXM4-80GB
      Uuid:                     GPU-6b9c0016-b8b3-94da-fbb5-f0025ddec471
    Mig:
      Parent Product Name:  NVIDIA A100-SXM4-80GB
      Placements:
        Size:   4
        Start:  0
      Profile:  4g.40gb
    Mig:
      Parent Product Name:  NVIDIA A100-SXM4-80GB
      Placements:
        Size:   8
        Start:  0
      Profile:  7g.80gb
    Mig:
      Parent Product Name:  NVIDIA A100-SXM4-80GB
      Placements:
        Size:   1
        Start:  0
        Size:   1
        Start:  1
        Size:   1
        Start:  2
        Size:   1
        Start:  3
        Size:   1
        Start:  4
        Size:   1
        Start:  5
        Size:   1
        Start:  6
      Profile:  1g.10gb+me
    Mig:
      Parent Product Name:  NVIDIA A100-SXM4-80GB
      Placements:
        Size:   2
        Start:  0
        Size:   2
        Start:  2
        Size:   2
        Start:  4
        Size:   2
        Start:  6
      Profile:  1g.20gb
    Mig:
      Parent Product Name:  NVIDIA A100-SXM4-80GB
      Placements:
        Size:   1
        Start:  0
        Size:   1
        Start:  1
        Size:   1
        Start:  2
        Size:   1
        Start:  3
        Size:   1
        Start:  4
        Size:   1
        Start:  5
        Size:   1
        Start:  6
      Profile:  1g.10gb
    Mig:
      Parent Product Name:  NVIDIA A100-SXM4-80GB
      Placements:
        Size:   2
        Start:  0
        Size:   2
        Start:  2
        Size:   2
        Start:  4
      Profile:  2g.20gb
    Mig:
      Parent Product Name:  NVIDIA A100-SXM4-80GB
      Placements:
        Size:   4
        Start:  0
        Size:   4
        Start:  4
      Profile:  3g.40gb
  Allocated Claims:
    b02c81b5-dc16-4e90-bc2e-6bae12600ce8:
      Claim Info:
        Name:       mig-2g.20gb
        Namespace:  gpu-test4
        UID:        b02c81b5-dc16-4e90-bc2e-6bae12600ce8
      Mig:
        Devices:
          Parent UUID:  GPU-6b9c0016-b8b3-94da-fbb5-f0025ddec471
          Placement:
            Size:   2
            Start:  2
          Profile:  2g.20gb
    d563525b-e5ae-453d-828b-fe40e366a789:
      Claim Info:
        Name:       mig-1g.20gb
        Namespace:  gpu-test4
        UID:        d563525b-e5ae-453d-828b-fe40e366a789
      Mig:
        Devices:
          Parent UUID:  GPU-6b9c0016-b8b3-94da-fbb5-f0025ddec471
          Placement:
            Size:   2
            Start:  0
          Profile:  1g.20gb
    e6005f2c-aaef-4404-9d7c-6a3af102c6cb:
      Claim Info:
        Name:       mig-enabled-gpu
        Namespace:  gpu-test4
        UID:        e6005f2c-aaef-4404-9d7c-6a3af102c6cb
      Gpu:
        Devices:
          Uuid:  GPU-6b9c0016-b8b3-94da-fbb5-f0025ddec471
  Prepared Claims:
    b02c81b5-dc16-4e90-bc2e-6bae12600ce8:
      Mig:
        Devices:
          Parent UUID:  GPU-6b9c0016-b8b3-94da-fbb5-f0025ddec471
          Placement:
            Size:   2
            Start:  2
          Profile:  2g.20gb
          Uuid:     MIG-7be7ad01-4ae6-53a7-82b1-f6241fa98851
    e6005f2c-aaef-4404-9d7c-6a3af102c6cb:
      Gpu:
        Devices:
          Uuid:  GPU-6b9c0016-b8b3-94da-fbb5-f0025ddec471
klueska commented 5 hours ago

Can this be closed?