Closed cr7258 closed 1 week ago
kubectl logs -n kube-system hami-scheduler-74b5f7df7-m67ff -c vgpu-scheduler-extender
Can you execute this command and upload the log?
It seems that there is a conflict in gpu resources, but no detailed log is recorded in the pod event
I1111 04:14:35.990771 1 client.go:53] BuildConfigFromFlags failed for file /root/.kube/config: stat /root/.kube/config: no such file or directory using inClusterConfig
I1111 04:14:35.992321 1 scheduler.go:63] New Scheduler
I1111 04:14:35.993143 1 reflector.go:289] Starting reflector *v1.Node (1h0m0s) from pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229
I1111 04:14:35.993271 1 reflector.go:325] Listing and watching *v1.Node from pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229
I1111 04:14:35.999747 1 reflector.go:289] Starting reflector *v1.Pod (1h0m0s) from pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229
I1111 04:14:35.999960 1 reflector.go:325] Listing and watching *v1.Pod from pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229
I1111 04:14:36.093414 1 shared_informer.go:341] caches populated
I1111 04:14:36.093447 1 shared_informer.go:341] caches populated
I1111 04:14:36.101270 1 route.go:42] Into Predicate Route outer func
I1111 04:14:36.101707 1 metrics.go:231] Initializing metrics for scheduler
I1111 04:14:36.101938 1 metrics.go:65] Starting to collect metrics for scheduler
I1111 04:14:36.101669 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:14:36" nodeName="gpu-cluster-control-plane"
I1111 04:14:36.103709 1 pods.go:105] Getting all scheduled pods with 0 nums
I1111 04:14:36.108040 1 main.go:86] listen on 0.0.0.0:443
I1111 04:14:36.134195 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:14:36.134405 1 scheduler.go:246] node gpu-cluster-control-plane device NVIDIA come node info=&{gpu-cluster-control-plane [{GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 0 10 15360 100 NVIDIA-Tesla T4 0 true NVIDIA}]} total=[{GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 0 10 15360 100 NVIDIA-Tesla T4 0 true NVIDIA}]
I1111 04:14:37.111227 1 route.go:44] Into Predicate Route inner func
I1111 04:14:37.113695 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="d04944cd-f20a-4b37-806d-4c2d442ce62d" namespaces="default"
I1111 04:14:37.113716 1 device.go:170] Counting iluvatar devices
I1111 04:14:37.113722 1 device.go:245] Counting mlu devices
I1111 04:14:37.113730 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 04:14:37.113761 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 04:14:37.113773 1 device.go:179] Counting dcu devices
I1111 04:14:37.113828 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 04:14:37.113843 1 score.go:32] devices status
I1111 04:14:37.113901 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 04:14:37.113911 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 04:14:37.113922 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 04:14:37.113932 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 04:14:37.113947 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 04:14:37.113977 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 04:14:37.114000 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 04:14:37.114008 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 04:14:37.114197 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 04:15:05.908157 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:15:05" nodeName="gpu-cluster-control-plane"
I1111 04:15:05.923070 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:15:24.283018 1 route.go:131] Start to handle webhook request on /webhook
I1111 04:15:24.284990 1 webhook.go:63] Processing admission hook for pod default/gpu-pod, UID: 803d7a11-3398-4836-9446-050c25391467
I1111 04:15:24.292893 1 route.go:44] Into Predicate Route inner func
I1111 04:15:24.293199 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="acb621b8-a0e8-4d47-bf4d-72cd2ec022f9" namespaces="default"
I1111 04:15:24.293217 1 device.go:245] Counting mlu devices
I1111 04:15:24.293226 1 device.go:250] idx= nvidia.com/gpu val= {{1 0} {<nil>} 1 DecimalSI} {{0 0} {<nil>} }
I1111 04:15:24.293243 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 04:15:24.293257 1 device.go:179] Counting dcu devices
I1111 04:15:24.293268 1 device.go:170] Counting iluvatar devices
I1111 04:15:24.293287 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":1,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 04:15:24.293307 1 score.go:32] devices status
I1111 04:15:24.293371 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 04:15:24.293392 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 04:15:24.293403 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 04:15:24.293420 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 04:15:24.293427 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 1.651042
I1111 04:15:24.293446 1 score.go:70] "Allocating device for container request" pod="default/gpu-pod" card request={"Nums":1,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}
I1111 04:15:24.293468 1 score.go:74] "scoring pod" pod="default/gpu-pod" Memreq=1000 MemPercentagereq=101 Coresreq=0 Nums=1 device index=0 device="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264"
I1111 04:15:24.293478 1 score.go:40] Type contains NVIDIA-Tesla T4 NVIDIA
I1111 04:15:24.293488 1 score.go:46] idx NVIDIA true true
I1111 04:15:24.293497 1 score.go:62] checkUUID result is true for NVIDIA type
I1111 04:15:24.293510 1 score.go:126] "first fitted" pod="default/gpu-pod" device="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264"
I1111 04:15:24.293566 1 score.go:137] "device allocate success" pod="default/gpu-pod" allocate device={"NVIDIA":[{"Idx":0,"UUID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Type":"NVIDIA","Usedmem":1000,"Usedcores":0}]}
I1111 04:15:24.293577 1 scheduler.go:485] nodeScores_len= 1
I1111 04:15:24.293585 1 scheduler.go:488] schedule default/gpu-pod to gpu-cluster-control-plane map[NVIDIA:[[{0 GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 NVIDIA 1000 0}]]]
I1111 04:15:24.293618 1 util.go:186] Encoded container Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,NVIDIA,1000,0:
I1111 04:15:24.293625 1 util.go:209] Encoded pod single devices GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,NVIDIA,1000,0:;
I1111 04:15:24.293646 1 pods.go:63] Pod added: Name: gpu-pod, UID: acb621b8-a0e8-4d47-bf4d-72cd2ec022f9, Namespace: default, NodeID: gpu-cluster-control-plane
I1111 04:15:24.300955 1 util.go:277] "Decoded pod annos" poddevices={"NVIDIA":[[{"Idx":0,"UUID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Type":"NVIDIA","Usedmem":1000,"Usedcores":0}]]}
I1111 04:15:24.301298 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Normal" reason="FilteringSucceed" message="Successfully filtered to following nodes: [gpu-cluster-control-plane] for default/gpu-pod "
I1111 04:15:24.302365 1 scheduler.go:375] "Bind" pod="gpu-pod" namespace="default" podUID="acb621b8-a0e8-4d47-bf4d-72cd2ec022f9" node="gpu-cluster-control-plane"
I1111 04:15:24.307173 1 device.go:245] Counting mlu devices
I1111 04:15:24.307184 1 device.go:250] idx= nvidia.com/gpu val= {{1 0} {<nil>} 1 DecimalSI} {{0 0} {<nil>} }
I1111 04:15:24.307195 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 04:15:24.328235 1 nodelock.go:65] "Node lock set" node="gpu-cluster-control-plane"
I1111 04:15:24.334285 1 util.go:277] "Decoded pod annos" poddevices={"NVIDIA":[[{"Idx":0,"UUID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Type":"NVIDIA","Usedmem":1000,"Usedcores":0}]]}
I1111 04:15:24.336440 1 scheduler.go:430] After Binding Process
I1111 04:15:24.337323 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Normal" reason="BindingSucceed" message="Successfully binding node [gpu-cluster-control-plane] to default/gpu-pod"
I1111 04:15:24.337718 1 util.go:277] "Decoded pod annos" poddevices={"NVIDIA":[[{"Idx":0,"UUID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Type":"NVIDIA","Usedmem":1000,"Usedcores":0}]]}
I1111 04:15:24.354560 1 util.go:277] "Decoded pod annos" poddevices={"NVIDIA":[[{"Idx":0,"UUID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Type":"NVIDIA","Usedmem":1000,"Usedcores":0}]]}
I1111 04:15:24.365949 1 util.go:277] "Decoded pod annos" poddevices={"NVIDIA":[[{"Idx":0,"UUID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Type":"NVIDIA","Usedmem":1000,"Usedcores":0}]]}
I1111 04:15:24.395938 1 util.go:277] "Decoded pod annos" poddevices={"NVIDIA":[[{"Idx":0,"UUID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Type":"NVIDIA","Usedmem":1000,"Usedcores":0}]]}
I1111 04:15:26.068870 1 util.go:277] "Decoded pod annos" poddevices={"NVIDIA":[[{"Idx":0,"UUID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Type":"NVIDIA","Usedmem":1000,"Usedcores":0}]]}
I1111 04:15:35.972144 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:15:35" nodeName="gpu-cluster-control-plane"
I1111 04:15:35.987788 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:15:36.360449 1 util.go:277] "Decoded pod annos" poddevices={"NVIDIA":[[{"Idx":0,"UUID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Type":"NVIDIA","Usedmem":1000,"Usedcores":0}]]}
I1111 04:15:45.787308 1 util.go:277] "Decoded pod annos" poddevices={"NVIDIA":[[{"Idx":0,"UUID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Type":"NVIDIA","Usedmem":1000,"Usedcores":0}]]}
I1111 04:15:45.793036 1 pods.go:72] Deleted pod gpu-pod with node ID gpu-cluster-control-plane
I1111 04:15:59.470368 1 route.go:131] Start to handle webhook request on /webhook
I1111 04:15:59.470899 1 webhook.go:63] Processing admission hook for pod default/gpu-pod, UID: e6271658-4bad-4a44-aeca-d70ccb94da76
I1111 04:15:59.479682 1 route.go:44] Into Predicate Route inner func
I1111 04:15:59.479990 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 04:15:59.480018 1 device.go:245] Counting mlu devices
I1111 04:15:59.480027 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 04:15:59.480044 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 04:15:59.480076 1 device.go:179] Counting dcu devices
I1111 04:15:59.480083 1 device.go:170] Counting iluvatar devices
I1111 04:15:59.480103 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 04:15:59.480132 1 score.go:32] devices status
I1111 04:15:59.480155 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 04:15:59.480165 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 04:15:59.480178 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 04:15:59.480189 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 04:15:59.480195 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 04:15:59.480212 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 04:15:59.480228 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 04:15:59.480238 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 04:15:59.480364 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 04:16:06.034082 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:16:06" nodeName="gpu-cluster-control-plane"
I1111 04:16:06.049654 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:16:36.096172 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:16:36" nodeName="gpu-cluster-control-plane"
I1111 04:16:36.112649 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:16:36.112701 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:16:36" nodeName="gpu-cluster-control-plane"
I1111 04:16:36.123660 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:17:06.156976 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:17:06" nodeName="gpu-cluster-control-plane"
I1111 04:17:06.172916 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:17:36.218272 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:17:36" nodeName="gpu-cluster-control-plane"
I1111 04:17:36.232934 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:18:06.279683 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:18:06" nodeName="gpu-cluster-control-plane"
I1111 04:18:06.294384 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:18:36.338427 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:18:36" nodeName="gpu-cluster-control-plane"
I1111 04:18:36.354655 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:19:06.408752 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:19:06" nodeName="gpu-cluster-control-plane"
I1111 04:19:06.423476 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:19:36.508757 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:19:36" nodeName="gpu-cluster-control-plane"
I1111 04:19:36.524316 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:20:06.570071 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:20:06" nodeName="gpu-cluster-control-plane"
I1111 04:20:06.584367 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:20:36.632236 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:20:36" nodeName="gpu-cluster-control-plane"
I1111 04:20:36.649814 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:21:06.693477 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:21:06" nodeName="gpu-cluster-control-plane"
I1111 04:21:06.707685 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:21:07.111875 1 route.go:44] Into Predicate Route inner func
I1111 04:21:07.112187 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 04:21:07.112203 1 device.go:170] Counting iluvatar devices
I1111 04:21:07.112210 1 device.go:245] Counting mlu devices
I1111 04:21:07.112217 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 04:21:07.112234 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 04:21:07.112249 1 device.go:179] Counting dcu devices
I1111 04:21:07.112284 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 04:21:07.112308 1 score.go:32] devices status
I1111 04:21:07.112334 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 04:21:07.112366 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 04:21:07.112375 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 04:21:07.112386 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 04:21:07.112397 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 04:21:07.112414 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 04:21:07.112439 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 04:21:07.112451 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 04:21:07.112539 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 04:21:36.754757 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:21:36" nodeName="gpu-cluster-control-plane"
I1111 04:21:36.770214 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:22:06.815698 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:22:06" nodeName="gpu-cluster-control-plane"
I1111 04:22:06.828575 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:22:36.874849 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:22:36" nodeName="gpu-cluster-control-plane"
I1111 04:22:36.889803 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:22:52.013998 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Node total 46 items received
I1111 04:23:06.934275 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:23:06" nodeName="gpu-cluster-control-plane"
I1111 04:23:06.947349 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:23:36.994530 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:23:36" nodeName="gpu-cluster-control-plane"
I1111 04:23:37.009114 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:24:07.053844 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:24:07" nodeName="gpu-cluster-control-plane"
I1111 04:24:07.068771 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:24:22.047449 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Pod total 28 items received
I1111 04:24:37.116750 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:24:37" nodeName="gpu-cluster-control-plane"
I1111 04:24:37.130264 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:25:07.179138 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:25:07" nodeName="gpu-cluster-control-plane"
I1111 04:25:07.193719 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:25:37.239319 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:25:37" nodeName="gpu-cluster-control-plane"
I1111 04:25:37.254643 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:26:07.118371 1 route.go:44] Into Predicate Route inner func
I1111 04:26:07.118699 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 04:26:07.118721 1 device.go:179] Counting dcu devices
I1111 04:26:07.118730 1 device.go:170] Counting iluvatar devices
I1111 04:26:07.118737 1 device.go:245] Counting mlu devices
I1111 04:26:07.118748 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 04:26:07.118764 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 04:26:07.118788 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 04:26:07.118808 1 score.go:32] devices status
I1111 04:26:07.118825 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 04:26:07.118837 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 04:26:07.118846 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 04:26:07.118857 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 04:26:07.118864 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 04:26:07.118890 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 04:26:07.118904 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 04:26:07.118915 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 04:26:07.119044 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 04:26:07.305269 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:26:07" nodeName="gpu-cluster-control-plane"
I1111 04:26:07.320874 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:26:37.387279 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:26:37" nodeName="gpu-cluster-control-plane"
I1111 04:26:37.401481 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:27:07.461753 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:27:07" nodeName="gpu-cluster-control-plane"
I1111 04:27:07.476510 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:27:37.533732 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:27:37" nodeName="gpu-cluster-control-plane"
I1111 04:27:37.548955 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:28:07.595266 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:28:07" nodeName="gpu-cluster-control-plane"
I1111 04:28:07.609742 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:28:37.654741 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:28:37" nodeName="gpu-cluster-control-plane"
I1111 04:28:37.669805 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:29:07.719242 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:29:07" nodeName="gpu-cluster-control-plane"
I1111 04:29:07.733255 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:29:33.049251 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Pod total 7 items received
I1111 04:29:37.780315 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:29:37" nodeName="gpu-cluster-control-plane"
I1111 04:29:37.794639 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:30:07.844658 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:30:07" nodeName="gpu-cluster-control-plane"
I1111 04:30:07.859619 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:30:37.906292 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:30:37" nodeName="gpu-cluster-control-plane"
I1111 04:30:37.922232 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:31:07.124212 1 route.go:44] Into Predicate Route inner func
I1111 04:31:07.124642 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 04:31:07.124668 1 device.go:245] Counting mlu devices
I1111 04:31:07.124676 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 04:31:07.124696 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 04:31:07.124711 1 device.go:179] Counting dcu devices
I1111 04:31:07.124720 1 device.go:170] Counting iluvatar devices
I1111 04:31:07.124743 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 04:31:07.124761 1 score.go:32] devices status
I1111 04:31:07.124791 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 04:31:07.124802 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 04:31:07.124820 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 04:31:07.124834 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 04:31:07.124842 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 04:31:07.124859 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 04:31:07.124874 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 04:31:07.124885 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 04:31:07.124991 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 04:31:07.968294 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:31:07" nodeName="gpu-cluster-control-plane"
I1111 04:31:07.984276 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:31:13.015591 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Node total 46 items received
I1111 04:31:38.027096 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:31:38" nodeName="gpu-cluster-control-plane"
I1111 04:31:38.040541 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:32:08.086594 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:32:08" nodeName="gpu-cluster-control-plane"
I1111 04:32:08.102215 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:32:38.150106 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:32:38" nodeName="gpu-cluster-control-plane"
I1111 04:32:38.164325 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:33:08.216024 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:33:08" nodeName="gpu-cluster-control-plane"
I1111 04:33:08.230263 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:33:38.279806 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:33:38" nodeName="gpu-cluster-control-plane"
I1111 04:33:38.294168 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:34:08.341366 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:34:08" nodeName="gpu-cluster-control-plane"
I1111 04:34:08.354720 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:34:38.401191 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:34:38" nodeName="gpu-cluster-control-plane"
I1111 04:34:38.417188 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:35:08.464077 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:35:08" nodeName="gpu-cluster-control-plane"
I1111 04:35:08.479316 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:35:38.523732 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:35:38" nodeName="gpu-cluster-control-plane"
I1111 04:35:38.538288 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:36:07.128902 1 route.go:44] Into Predicate Route inner func
I1111 04:36:07.129221 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 04:36:07.129238 1 device.go:245] Counting mlu devices
I1111 04:36:07.129246 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 04:36:07.129268 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 04:36:07.129282 1 device.go:179] Counting dcu devices
I1111 04:36:07.129290 1 device.go:170] Counting iluvatar devices
I1111 04:36:07.129316 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 04:36:07.129374 1 score.go:32] devices status
I1111 04:36:07.129407 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 04:36:07.129418 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 04:36:07.129431 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 04:36:07.129443 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 04:36:07.129450 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 04:36:07.129468 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 04:36:07.129482 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 04:36:07.129492 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 04:36:07.129614 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 04:36:08.584614 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:36:08" nodeName="gpu-cluster-control-plane"
I1111 04:36:08.599295 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:36:38.657909 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:36:38" nodeName="gpu-cluster-control-plane"
I1111 04:36:38.674554 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:37:08.722686 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:37:08" nodeName="gpu-cluster-control-plane"
I1111 04:37:08.738381 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:37:38.051324 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Pod total 10 items received
I1111 04:37:38.783458 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:37:38" nodeName="gpu-cluster-control-plane"
I1111 04:37:38.796627 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:38:08.846292 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:38:08" nodeName="gpu-cluster-control-plane"
I1111 04:38:08.861779 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:38:38.910917 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:38:38" nodeName="gpu-cluster-control-plane"
I1111 04:38:38.927429 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:39:08.975417 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:39:08" nodeName="gpu-cluster-control-plane"
I1111 04:39:08.989005 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:39:36.017601 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Node total 43 items received
I1111 04:39:39.034079 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:39:39" nodeName="gpu-cluster-control-plane"
I1111 04:39:39.049745 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:40:09.093605 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:40:09" nodeName="gpu-cluster-control-plane"
I1111 04:40:09.106884 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:40:39.153218 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:40:39" nodeName="gpu-cluster-control-plane"
I1111 04:40:39.170395 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:41:07.134913 1 route.go:44] Into Predicate Route inner func
I1111 04:41:07.135246 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 04:41:07.135266 1 device.go:170] Counting iluvatar devices
I1111 04:41:07.135273 1 device.go:245] Counting mlu devices
I1111 04:41:07.135280 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 04:41:07.135296 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 04:41:07.135310 1 device.go:179] Counting dcu devices
I1111 04:41:07.135333 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 04:41:07.135365 1 score.go:32] devices status
I1111 04:41:07.135391 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 04:41:07.135401 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 04:41:07.135412 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 04:41:07.135424 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 04:41:07.135433 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 04:41:07.135450 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 04:41:07.135465 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 04:41:07.135475 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 04:41:07.135616 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 04:41:09.219686 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:41:09" nodeName="gpu-cluster-control-plane"
I1111 04:41:09.236310 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:41:39.279477 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:41:39" nodeName="gpu-cluster-control-plane"
I1111 04:41:39.294282 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:42:09.340136 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:42:09" nodeName="gpu-cluster-control-plane"
I1111 04:42:09.355233 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:42:39.402742 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:42:39" nodeName="gpu-cluster-control-plane"
I1111 04:42:39.418723 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:43:09.463416 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:43:09" nodeName="gpu-cluster-control-plane"
I1111 04:43:09.478332 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:43:28.053067 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Pod total 6 items received
I1111 04:43:39.524750 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:43:39" nodeName="gpu-cluster-control-plane"
I1111 04:43:39.539575 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:44:09.585287 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:44:09" nodeName="gpu-cluster-control-plane"
I1111 04:44:09.601210 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:44:39.687217 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:44:39" nodeName="gpu-cluster-control-plane"
I1111 04:44:39.707821 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:45:09.750614 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:45:09" nodeName="gpu-cluster-control-plane"
I1111 04:45:09.765521 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:45:39.809453 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:45:39" nodeName="gpu-cluster-control-plane"
I1111 04:45:39.824146 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:45:54.019489 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Node total 35 items received
I1111 04:46:07.141621 1 route.go:44] Into Predicate Route inner func
I1111 04:46:07.141941 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 04:46:07.141961 1 device.go:245] Counting mlu devices
I1111 04:46:07.141972 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 04:46:07.141989 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 04:46:07.142005 1 device.go:179] Counting dcu devices
I1111 04:46:07.142016 1 device.go:170] Counting iluvatar devices
I1111 04:46:07.142036 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 04:46:07.142052 1 score.go:32] devices status
I1111 04:46:07.142073 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 04:46:07.142085 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 04:46:07.142099 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 04:46:07.142112 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 04:46:07.142119 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 04:46:07.142136 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 04:46:07.142153 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 04:46:07.142163 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 04:46:07.142311 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 04:46:09.870789 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:46:09" nodeName="gpu-cluster-control-plane"
I1111 04:46:09.884768 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:46:39.934165 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:46:39" nodeName="gpu-cluster-control-plane"
I1111 04:46:39.952516 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:47:09.998223 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:47:09" nodeName="gpu-cluster-control-plane"
I1111 04:47:10.014366 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:47:40.074088 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:47:40" nodeName="gpu-cluster-control-plane"
I1111 04:47:40.091601 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:48:10.135199 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:48:10" nodeName="gpu-cluster-control-plane"
I1111 04:48:10.148700 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:48:40.196708 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:48:40" nodeName="gpu-cluster-control-plane"
I1111 04:48:40.213260 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:49:10.257766 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:49:10" nodeName="gpu-cluster-control-plane"
I1111 04:49:10.275309 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:49:37.055067 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Pod total 8 items received
I1111 04:49:40.317713 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:49:40" nodeName="gpu-cluster-control-plane"
I1111 04:49:40.333307 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:50:10.378509 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:50:10" nodeName="gpu-cluster-control-plane"
I1111 04:50:10.396530 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:50:40.437843 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:50:40" nodeName="gpu-cluster-control-plane"
I1111 04:50:40.453235 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:51:07.146892 1 route.go:44] Into Predicate Route inner func
I1111 04:51:07.147203 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 04:51:07.147221 1 device.go:170] Counting iluvatar devices
I1111 04:51:07.147229 1 device.go:245] Counting mlu devices
I1111 04:51:07.147237 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 04:51:07.147261 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 04:51:07.147276 1 device.go:179] Counting dcu devices
I1111 04:51:07.147306 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 04:51:07.147325 1 score.go:32] devices status
I1111 04:51:07.147375 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 04:51:07.147391 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 04:51:07.147402 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 04:51:07.147414 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 04:51:07.147421 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 04:51:07.147439 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 04:51:07.147452 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 04:51:07.147461 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 04:51:07.147584 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 04:51:10.502491 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:51:10" nodeName="gpu-cluster-control-plane"
I1111 04:51:10.520066 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:51:40.564745 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:51:40" nodeName="gpu-cluster-control-plane"
I1111 04:51:40.578653 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:52:10.625400 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:52:10" nodeName="gpu-cluster-control-plane"
I1111 04:52:10.642092 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:52:40.689859 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:52:40" nodeName="gpu-cluster-control-plane"
I1111 04:52:40.705860 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:53:10.754425 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:53:10" nodeName="gpu-cluster-control-plane"
I1111 04:53:10.769913 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:53:40.812828 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:53:40" nodeName="gpu-cluster-control-plane"
I1111 04:53:40.825898 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:54:09.021228 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Node total 42 items received
I1111 04:54:10.873327 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:54:10" nodeName="gpu-cluster-control-plane"
I1111 04:54:10.891743 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:54:40.933286 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:54:40" nodeName="gpu-cluster-control-plane"
I1111 04:54:40.948838 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:55:10.996452 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:55:10" nodeName="gpu-cluster-control-plane"
I1111 04:55:11.010872 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:55:41.057297 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:55:41" nodeName="gpu-cluster-control-plane"
I1111 04:55:41.072523 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:56:07.152466 1 route.go:44] Into Predicate Route inner func
I1111 04:56:07.152790 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 04:56:07.152808 1 device.go:245] Counting mlu devices
I1111 04:56:07.152816 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 04:56:07.152837 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 04:56:07.152852 1 device.go:179] Counting dcu devices
I1111 04:56:07.152860 1 device.go:170] Counting iluvatar devices
I1111 04:56:07.152882 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 04:56:07.152901 1 score.go:32] devices status
I1111 04:56:07.152926 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 04:56:07.152937 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 04:56:07.152949 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 04:56:07.152962 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 04:56:07.152971 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 04:56:07.152989 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 04:56:07.153004 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 04:56:07.153014 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 04:56:07.153170 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 04:56:11.118514 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:56:11" nodeName="gpu-cluster-control-plane"
I1111 04:56:11.132548 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:56:17.056893 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Pod total 7 items received
I1111 04:56:41.181012 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:56:41" nodeName="gpu-cluster-control-plane"
I1111 04:56:41.197087 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:57:11.247524 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:57:11" nodeName="gpu-cluster-control-plane"
I1111 04:57:11.260767 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:57:41.308883 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:57:41" nodeName="gpu-cluster-control-plane"
I1111 04:57:41.324036 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:58:11.370847 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:58:11" nodeName="gpu-cluster-control-plane"
I1111 04:58:11.386318 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:58:41.432212 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:58:41" nodeName="gpu-cluster-control-plane"
I1111 04:58:41.448562 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:59:11.491223 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:59:11" nodeName="gpu-cluster-control-plane"
I1111 04:59:11.506641 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 04:59:41.551633 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 04:59:41" nodeName="gpu-cluster-control-plane"
I1111 04:59:41.566748 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:00:11.613492 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:00:11" nodeName="gpu-cluster-control-plane"
I1111 05:00:11.628511 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:00:41.678290 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:00:41" nodeName="gpu-cluster-control-plane"
I1111 05:00:41.695206 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:01:07.159814 1 route.go:44] Into Predicate Route inner func
I1111 05:01:07.160170 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 05:01:07.160188 1 device.go:179] Counting dcu devices
I1111 05:01:07.160195 1 device.go:170] Counting iluvatar devices
I1111 05:01:07.160203 1 device.go:245] Counting mlu devices
I1111 05:01:07.160220 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 05:01:07.160288 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 05:01:07.160323 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 05:01:07.160353 1 score.go:32] devices status
I1111 05:01:07.160376 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 05:01:07.160388 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 05:01:07.160401 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 05:01:07.160413 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 05:01:07.160438 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 05:01:07.160457 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 05:01:07.160474 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 05:01:07.160481 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 05:01:07.160622 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 05:01:11.738951 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:01:11" nodeName="gpu-cluster-control-plane"
I1111 05:01:11.756286 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:01:37.022961 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Node total 41 items received
I1111 05:01:41.800016 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:01:41" nodeName="gpu-cluster-control-plane"
I1111 05:01:41.815349 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:02:11.860571 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:02:11" nodeName="gpu-cluster-control-plane"
I1111 05:02:11.876557 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:02:41.923660 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:02:41" nodeName="gpu-cluster-control-plane"
I1111 05:02:41.944288 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:03:11.982597 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:03:11" nodeName="gpu-cluster-control-plane"
I1111 05:03:11.997699 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:03:42.042966 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:03:42" nodeName="gpu-cluster-control-plane"
I1111 05:03:42.058371 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:04:12.103590 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:04:12" nodeName="gpu-cluster-control-plane"
I1111 05:04:12.119116 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:04:42.164404 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:04:42" nodeName="gpu-cluster-control-plane"
I1111 05:04:42.178501 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:05:08.058061 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Pod total 9 items received
I1111 05:05:12.229582 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:05:12" nodeName="gpu-cluster-control-plane"
I1111 05:05:12.245031 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:05:42.292868 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:05:42" nodeName="gpu-cluster-control-plane"
I1111 05:05:42.306208 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:06:07.164229 1 route.go:44] Into Predicate Route inner func
I1111 05:06:07.164553 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 05:06:07.164575 1 device.go:179] Counting dcu devices
I1111 05:06:07.164584 1 device.go:170] Counting iluvatar devices
I1111 05:06:07.164591 1 device.go:245] Counting mlu devices
I1111 05:06:07.164598 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 05:06:07.164616 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 05:06:07.164640 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 05:06:07.164658 1 score.go:32] devices status
I1111 05:06:07.164684 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 05:06:07.164697 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 05:06:07.164706 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 05:06:07.164714 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 05:06:07.164719 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 05:06:07.164736 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 05:06:07.164751 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 05:06:07.164761 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 05:06:07.164923 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 05:06:12.354349 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:06:12" nodeName="gpu-cluster-control-plane"
I1111 05:06:12.369524 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:06:42.418465 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:06:42" nodeName="gpu-cluster-control-plane"
I1111 05:06:42.433730 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:07:11.024410 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Node total 30 items received
I1111 05:07:12.482152 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:07:12" nodeName="gpu-cluster-control-plane"
I1111 05:07:12.496906 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:07:42.542529 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:07:42" nodeName="gpu-cluster-control-plane"
I1111 05:07:42.555983 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:08:12.601053 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:08:12" nodeName="gpu-cluster-control-plane"
I1111 05:08:12.616218 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:08:42.661048 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:08:42" nodeName="gpu-cluster-control-plane"
I1111 05:08:42.675449 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:09:12.722830 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:09:12" nodeName="gpu-cluster-control-plane"
I1111 05:09:12.736951 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:09:42.783394 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:09:42" nodeName="gpu-cluster-control-plane"
I1111 05:09:42.838871 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:10:12.846117 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:10:12" nodeName="gpu-cluster-control-plane"
I1111 05:10:12.860832 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:10:42.909113 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:10:42" nodeName="gpu-cluster-control-plane"
I1111 05:10:42.924042 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:11:07.170348 1 route.go:44] Into Predicate Route inner func
I1111 05:11:07.170669 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 05:11:07.170687 1 device.go:245] Counting mlu devices
I1111 05:11:07.170695 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 05:11:07.170712 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 05:11:07.170726 1 device.go:179] Counting dcu devices
I1111 05:11:07.170735 1 device.go:170] Counting iluvatar devices
I1111 05:11:07.170758 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 05:11:07.170775 1 score.go:32] devices status
I1111 05:11:07.170797 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 05:11:07.170809 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 05:11:07.170820 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 05:11:07.170832 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 05:11:07.170846 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 05:11:07.170866 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 05:11:07.170882 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 05:11:07.170894 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 05:11:07.171070 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 05:11:12.968201 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:11:12" nodeName="gpu-cluster-control-plane"
I1111 05:11:12.984560 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:11:43.030296 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:11:43" nodeName="gpu-cluster-control-plane"
I1111 05:11:43.044560 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:12:13.092524 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:12:13" nodeName="gpu-cluster-control-plane"
I1111 05:12:13.107167 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:12:18.025477 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Node total 30 items received
I1111 05:12:43.157068 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:12:43" nodeName="gpu-cluster-control-plane"
I1111 05:12:43.173019 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:13:13.222238 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:13:13" nodeName="gpu-cluster-control-plane"
I1111 05:13:13.235500 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:13:43.288826 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:13:43" nodeName="gpu-cluster-control-plane"
I1111 05:13:43.301960 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:14:13.352386 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:14:13" nodeName="gpu-cluster-control-plane"
I1111 05:14:13.369199 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:14:36.024402 1 reflector.go:378] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: forcing resync
I1111 05:14:36.046540 1 reflector.go:378] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: forcing resync
I1111 05:14:43.411690 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:14:43" nodeName="gpu-cluster-control-plane"
I1111 05:14:43.428680 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:14:50.060045 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Pod total 10 items received
I1111 05:15:13.471632 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:15:13" nodeName="gpu-cluster-control-plane"
I1111 05:15:13.486301 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:15:43.547789 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:15:43" nodeName="gpu-cluster-control-plane"
I1111 05:15:43.561602 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:16:07.176886 1 route.go:44] Into Predicate Route inner func
I1111 05:16:07.177195 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 05:16:07.177212 1 device.go:245] Counting mlu devices
I1111 05:16:07.177218 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 05:16:07.177236 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 05:16:07.177251 1 device.go:179] Counting dcu devices
I1111 05:16:07.177260 1 device.go:170] Counting iluvatar devices
I1111 05:16:07.177285 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 05:16:07.177308 1 score.go:32] devices status
I1111 05:16:07.177330 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 05:16:07.177364 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 05:16:07.177378 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 05:16:07.177389 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 05:16:07.177396 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 05:16:07.177414 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 05:16:07.177429 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 05:16:07.177441 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 05:16:07.177533 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 05:16:13.606379 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:16:13" nodeName="gpu-cluster-control-plane"
I1111 05:16:13.620662 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:16:43.684822 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:16:43" nodeName="gpu-cluster-control-plane"
I1111 05:16:43.701817 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:17:13.744649 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:17:13" nodeName="gpu-cluster-control-plane"
I1111 05:17:13.759481 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:17:43.804875 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:17:43" nodeName="gpu-cluster-control-plane"
I1111 05:17:43.819142 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:18:13.865355 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:18:13" nodeName="gpu-cluster-control-plane"
I1111 05:18:13.880590 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:18:43.926250 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:18:43" nodeName="gpu-cluster-control-plane"
I1111 05:18:43.942466 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:19:13.991148 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:19:13" nodeName="gpu-cluster-control-plane"
I1111 05:19:14.005397 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:19:44.050017 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:19:44" nodeName="gpu-cluster-control-plane"
I1111 05:19:44.063295 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:19:47.027018 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Node total 40 items received
I1111 05:20:14.109941 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:20:14" nodeName="gpu-cluster-control-plane"
I1111 05:20:14.123451 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:20:44.171198 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:20:44" nodeName="gpu-cluster-control-plane"
I1111 05:20:44.185894 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:21:07.182455 1 route.go:44] Into Predicate Route inner func
I1111 05:21:07.182806 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 05:21:07.182829 1 device.go:179] Counting dcu devices
I1111 05:21:07.182837 1 device.go:170] Counting iluvatar devices
I1111 05:21:07.182845 1 device.go:245] Counting mlu devices
I1111 05:21:07.182853 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 05:21:07.182869 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 05:21:07.182891 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 05:21:07.182917 1 score.go:32] devices status
I1111 05:21:07.182996 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 05:21:07.183013 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 05:21:07.183026 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 05:21:07.183038 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 05:21:07.183046 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 05:21:07.183067 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 05:21:07.183080 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 05:21:07.183092 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 05:21:07.183250 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 05:21:14.230710 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:21:14" nodeName="gpu-cluster-control-plane"
I1111 05:21:14.246472 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:21:44.292372 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:21:44" nodeName="gpu-cluster-control-plane"
I1111 05:21:44.311286 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:22:14.351650 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:22:14" nodeName="gpu-cluster-control-plane"
I1111 05:22:14.366222 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:22:44.412691 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:22:44" nodeName="gpu-cluster-control-plane"
I1111 05:22:44.428362 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:23:14.474068 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:23:14" nodeName="gpu-cluster-control-plane"
I1111 05:23:14.487693 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:23:44.533868 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:23:44" nodeName="gpu-cluster-control-plane"
I1111 05:23:44.547212 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:24:14.593877 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:24:14" nodeName="gpu-cluster-control-plane"
I1111 05:24:14.609210 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:24:38.061088 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Pod total 10 items received
I1111 05:24:44.654300 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:24:44" nodeName="gpu-cluster-control-plane"
I1111 05:24:44.669329 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:25:14.718831 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:25:14" nodeName="gpu-cluster-control-plane"
I1111 05:25:14.734070 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:25:44.776953 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:25:44" nodeName="gpu-cluster-control-plane"
I1111 05:25:44.789951 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:26:07.188974 1 route.go:44] Into Predicate Route inner func
I1111 05:26:07.189281 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 05:26:07.189301 1 device.go:245] Counting mlu devices
I1111 05:26:07.189310 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 05:26:07.189329 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 05:26:07.189356 1 device.go:179] Counting dcu devices
I1111 05:26:07.189364 1 device.go:170] Counting iluvatar devices
I1111 05:26:07.189401 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 05:26:07.189423 1 score.go:32] devices status
I1111 05:26:07.189449 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 05:26:07.189465 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 05:26:07.189478 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 05:26:07.189490 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 05:26:07.189499 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 05:26:07.189516 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 05:26:07.189527 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 05:26:07.189535 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 05:26:07.189686 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 05:26:14.846855 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:26:14" nodeName="gpu-cluster-control-plane"
I1111 05:26:14.862137 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:26:44.907729 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:26:44" nodeName="gpu-cluster-control-plane"
I1111 05:26:44.922370 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:27:14.973323 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:27:14" nodeName="gpu-cluster-control-plane"
I1111 05:27:14.989124 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:27:45.029662 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:27:45" nodeName="gpu-cluster-control-plane"
I1111 05:27:45.042714 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:28:15.089788 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:28:15" nodeName="gpu-cluster-control-plane"
I1111 05:28:15.102386 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:28:45.149556 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:28:45" nodeName="gpu-cluster-control-plane"
I1111 05:28:45.167388 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:28:57.028902 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Node total 49 items received
I1111 05:29:15.212709 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:29:15" nodeName="gpu-cluster-control-plane"
I1111 05:29:15.227952 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:29:45.273825 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:29:45" nodeName="gpu-cluster-control-plane"
I1111 05:29:45.289652 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:30:15.335321 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:30:15" nodeName="gpu-cluster-control-plane"
I1111 05:30:15.350179 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:30:22.062590 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Pod total 6 items received
I1111 05:30:45.395472 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:30:45" nodeName="gpu-cluster-control-plane"
I1111 05:30:45.411008 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:31:07.193775 1 route.go:44] Into Predicate Route inner func
I1111 05:31:07.194090 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 05:31:07.194109 1 device.go:179] Counting dcu devices
I1111 05:31:07.194117 1 device.go:170] Counting iluvatar devices
I1111 05:31:07.194125 1 device.go:245] Counting mlu devices
I1111 05:31:07.194134 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 05:31:07.194152 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 05:31:07.194180 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 05:31:07.194200 1 score.go:32] devices status
I1111 05:31:07.194221 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 05:31:07.194233 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 05:31:07.194245 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 05:31:07.194258 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 05:31:07.194294 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 05:31:07.194316 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 05:31:07.194352 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 05:31:07.194367 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 05:31:07.194491 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 05:31:15.456937 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:31:15" nodeName="gpu-cluster-control-plane"
I1111 05:31:15.472764 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:31:45.519145 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:31:45" nodeName="gpu-cluster-control-plane"
I1111 05:31:45.534487 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:32:15.577732 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:32:15" nodeName="gpu-cluster-control-plane"
I1111 05:32:15.591478 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:32:45.638685 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:32:45" nodeName="gpu-cluster-control-plane"
I1111 05:32:45.653546 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:33:15.702553 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:33:15" nodeName="gpu-cluster-control-plane"
I1111 05:33:15.717785 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:33:45.763481 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:33:45" nodeName="gpu-cluster-control-plane"
I1111 05:33:45.778291 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:34:15.824029 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:34:15" nodeName="gpu-cluster-control-plane"
I1111 05:34:15.839307 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:34:45.883605 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:34:45" nodeName="gpu-cluster-control-plane"
I1111 05:34:45.900010 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:35:15.950574 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:35:15" nodeName="gpu-cluster-control-plane"
I1111 05:35:15.964933 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:35:46.017514 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:35:46" nodeName="gpu-cluster-control-plane"
I1111 05:35:46.032916 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:35:57.030221 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Node total 37 items received
I1111 05:36:07.201007 1 route.go:44] Into Predicate Route inner func
I1111 05:36:07.201321 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 05:36:07.201351 1 device.go:170] Counting iluvatar devices
I1111 05:36:07.201360 1 device.go:245] Counting mlu devices
I1111 05:36:07.201370 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 05:36:07.201388 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 05:36:07.201403 1 device.go:179] Counting dcu devices
I1111 05:36:07.201427 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 05:36:07.201449 1 score.go:32] devices status
I1111 05:36:07.201468 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 05:36:07.201481 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 05:36:07.201492 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 05:36:07.201504 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 05:36:07.201518 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 05:36:07.201535 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 05:36:07.201554 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 05:36:07.201571 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 05:36:07.201672 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 05:36:16.080589 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:36:16" nodeName="gpu-cluster-control-plane"
I1111 05:36:16.094542 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:36:46.145982 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:36:46" nodeName="gpu-cluster-control-plane"
I1111 05:36:46.160533 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:37:16.209514 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:37:16" nodeName="gpu-cluster-control-plane"
I1111 05:37:16.224127 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:37:46.270702 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:37:46" nodeName="gpu-cluster-control-plane"
I1111 05:37:46.288611 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:38:16.330925 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:38:16" nodeName="gpu-cluster-control-plane"
I1111 05:38:16.346117 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:38:46.393239 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:38:46" nodeName="gpu-cluster-control-plane"
I1111 05:38:46.410574 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:39:16.458604 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:39:16" nodeName="gpu-cluster-control-plane"
I1111 05:39:16.474054 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:39:33.063952 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Pod total 11 items received
I1111 05:39:46.519084 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:39:46" nodeName="gpu-cluster-control-plane"
I1111 05:39:46.533930 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:40:16.580085 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:40:16" nodeName="gpu-cluster-control-plane"
I1111 05:40:16.593464 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:40:46.642141 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:40:46" nodeName="gpu-cluster-control-plane"
I1111 05:40:46.658361 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:41:07.208118 1 route.go:44] Into Predicate Route inner func
I1111 05:41:07.208443 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 05:41:07.208461 1 device.go:245] Counting mlu devices
I1111 05:41:07.208467 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 05:41:07.208484 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 05:41:07.208499 1 device.go:179] Counting dcu devices
I1111 05:41:07.208509 1 device.go:170] Counting iluvatar devices
I1111 05:41:07.208532 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 05:41:07.208549 1 score.go:32] devices status
I1111 05:41:07.208570 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 05:41:07.208582 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 05:41:07.208593 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 05:41:07.208606 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 05:41:07.208615 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 05:41:07.208632 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 05:41:07.208644 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 05:41:07.208655 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 05:41:07.208819 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 05:41:16.704185 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:41:16" nodeName="gpu-cluster-control-plane"
I1111 05:41:16.718579 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:41:46.764711 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:41:46" nodeName="gpu-cluster-control-plane"
I1111 05:41:46.781074 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:42:16.827449 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:42:16" nodeName="gpu-cluster-control-plane"
I1111 05:42:16.846788 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:42:46.891731 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:42:46" nodeName="gpu-cluster-control-plane"
I1111 05:42:46.906090 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:43:16.951782 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:43:16" nodeName="gpu-cluster-control-plane"
I1111 05:43:16.967642 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:43:47.027862 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:43:47" nodeName="gpu-cluster-control-plane"
I1111 05:43:47.042667 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:44:17.103384 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:44:17" nodeName="gpu-cluster-control-plane"
I1111 05:44:17.117724 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:44:47.168108 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:44:47" nodeName="gpu-cluster-control-plane"
I1111 05:44:47.185039 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:44:48.065204 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Pod total 6 items received
I1111 05:45:17.232880 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:45:17" nodeName="gpu-cluster-control-plane"
I1111 05:45:17.247747 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:45:40.031403 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Node total 51 items received
I1111 05:45:47.293509 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:45:47" nodeName="gpu-cluster-control-plane"
I1111 05:45:47.314238 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:46:07.212841 1 route.go:44] Into Predicate Route inner func
I1111 05:46:07.213308 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 05:46:07.213483 1 device.go:179] Counting dcu devices
I1111 05:46:07.213496 1 device.go:170] Counting iluvatar devices
I1111 05:46:07.213503 1 device.go:245] Counting mlu devices
I1111 05:46:07.213511 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 05:46:07.213592 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 05:46:07.213632 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 05:46:07.213649 1 score.go:32] devices status
I1111 05:46:07.213759 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 05:46:07.213816 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 05:46:07.213853 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 05:46:07.213894 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 05:46:07.213906 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 05:46:07.213925 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 05:46:07.213981 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 05:46:07.214013 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 05:46:07.214165 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 05:46:17.354182 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:46:17" nodeName="gpu-cluster-control-plane"
I1111 05:46:17.369528 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:46:47.416787 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:46:47" nodeName="gpu-cluster-control-plane"
I1111 05:46:47.432219 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:47:17.482011 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:47:17" nodeName="gpu-cluster-control-plane"
I1111 05:47:17.497414 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:47:47.540480 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:47:47" nodeName="gpu-cluster-control-plane"
I1111 05:47:47.555653 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:48:17.599825 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:48:17" nodeName="gpu-cluster-control-plane"
I1111 05:48:17.613092 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:48:47.660537 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:48:47" nodeName="gpu-cluster-control-plane"
I1111 05:48:47.678262 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:49:17.725039 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:49:17" nodeName="gpu-cluster-control-plane"
I1111 05:49:17.740639 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:49:47.786408 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:49:47" nodeName="gpu-cluster-control-plane"
I1111 05:49:47.801913 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:49:52.067062 1 reflector.go:790] pkg/mod/k8s.io/client-go@v0.28.3/tools/cache/reflector.go:229: Watch close - *v1.Pod total 7 items received
I1111 05:50:17.848006 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:50:17" nodeName="gpu-cluster-control-plane"
I1111 05:50:17.861955 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:50:47.909875 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:50:47" nodeName="gpu-cluster-control-plane"
I1111 05:50:47.925823 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:51:07.217244 1 route.go:44] Into Predicate Route inner func
I1111 05:51:07.217570 1 scheduler.go:445] "begin schedule filter" pod="gpu-pod" uuid="9c2f8cbd-23bf-4ea6-9423-099b99b1e558" namespaces="default"
I1111 05:51:07.217589 1 device.go:245] Counting mlu devices
I1111 05:51:07.217600 1 device.go:250] idx= nvidia.com/gpu val= {{2 0} {<nil>} 2 DecimalSI} {{0 0} {<nil>} }
I1111 05:51:07.217617 1 device.go:250] idx= nvidia.com/gpumem val= {{1 3} {<nil>} 1k DecimalSI} {{0 0} {<nil>} }
I1111 05:51:07.217633 1 device.go:179] Counting dcu devices
I1111 05:51:07.217643 1 device.go:170] Counting iluvatar devices
I1111 05:51:07.217663 1 pod.go:40] "collect requestreqs" counts=[{"NVIDIA":{"Nums":2,"Type":"NVIDIA","Memreq":1000,"MemPercentagereq":101,"Coresreq":0}}]
I1111 05:51:07.217681 1 score.go:32] devices status
I1111 05:51:07.217703 1 score.go:34] "device status" device id="GPU-a02b30b8-5b20-131c-0e47-6bc99948e264" device detail={"Device":{"ID":"GPU-a02b30b8-5b20-131c-0e47-6bc99948e264","Index":0,"Used":0,"Count":10,"Usedmem":0,"Totalmem":15360,"Totalcore":100,"Usedcores":0,"Numa":0,"Type":"NVIDIA-Tesla T4","Health":true},"Score":0}
I1111 05:51:07.217715 1 node_policy.go:61] node gpu-cluster-control-plane used 0, usedCore 0, usedMem 0,
I1111 05:51:07.217730 1 node_policy.go:73] node gpu-cluster-control-plane computer score is 0.000000
I1111 05:51:07.217742 1 gpu_policy.go:70] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 user 0, userCore 0, userMem 0,
I1111 05:51:07.217749 1 gpu_policy.go:76] device GPU-a02b30b8-5b20-131c-0e47-6bc99948e264 computer score is 2.651042
I1111 05:51:07.217767 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
I1111 05:51:07.217780 1 score.go:225] "calcScore:node not fit pod" pod="default/gpu-pod" node="gpu-cluster-control-plane"
I1111 05:51:07.217791 1 scheduler.go:479] All node scores do not meet for pod gpu-pod
I1111 05:51:07.217940 1 event.go:307] "Event occurred" object="default/gpu-pod" fieldPath="" kind="Pod" apiVersion="v1" type="Warning" reason="FilteringFailed" message="no available node, all node scores do not meet"
I1111 05:51:17.972672 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:51:17" nodeName="gpu-cluster-control-plane"
I1111 05:51:17.988619 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:51:48.033309 1 scheduler.go:201] "New timestamp" hami.io/node-handshake="Requesting_2024.11.11 05:51:48" nodeName="gpu-cluster-control-plane"
I1111 05:51:48.050057 1 util.go:163] Encoded node Devices: GPU-a02b30b8-5b20-131c-0e47-6bc99948e264,10,15360,100,NVIDIA-Tesla T4,0,true:
I1111 05:51:07.217767 1 score.go:158] "request devices nums cannot exceed the total number of devices on the node." pod="default/gpu-pod" request devices nums=2 node device nums=1
The log shows that there is only one physical GPU on the node, but you declared 2 in the pod, so an exception occurs.
The 10 vGPUs registered by the node indicate that the GPU can be shared by 10 pods, which does not mean that a pod can declare multiple vGPUs.
The 10 vGPUs registered by the node indicate that the GPU can be shared by 10 pods, which does not mean that a pod can declare multiple vGPUs.
Ah, got it. I initially misunderstood it. Thank you for your explanation.
What happened:
I set up a single-node GPU cluster using Kind. When I attempt to allocate 1 vGPU to a Pod, it works as expected. However, when I try to allocate more than 1 vGPU to the Pod, it fails, even though there are enough vGPUs available.
There are 10 vGPUs.
Create a Pod that requests 2 vGPUs.
Pod events.
What you expected to happen:
The Pod can request more than 1 vGPU.
How to reproduce it (as minimally and precisely as possible):
Create Kind GPU cluster: https://github.com/cr7258/hands-on-lab/blob/main/ai/gpu/script/ubuntu-kind-gpu-cluster.sh Install HAMi:
helm install hami hami-charts/hami --set scheduler.kubeScheduler.imageTag=v1.31.2 -n kube-system
Label node:kubectl label node gpu-cluster-control-plane gpu=on
Anything else we need to know?:
The output of
nvidia-smi -a
Your docker or containerd configuration file (e.g:
/etc/docker/daemon.json
)The hami-device-plugin container logs
The hami-scheduler container logs
sudo journalctl -r -u kubelet
)dmesg
Environment:
docker version
: 24.0.7uname -a
:Linux gpu-demo 6.8.0-40-generic #40-Ubuntu SMP PREEMPT_DYNAMIC Fri Jul 5 10:34:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux