Closed rajeeshckr closed 1 month ago
@wawa0210 tagging you on this since you seems to be replying some of the questions recently. Could you please help me with this when you get a chance? 🙇🏼
@wawa0210 tagging you on this since you seems to be replying some of the questions recently. Could you please help me with this when you get a chance? 🙇🏼
Do you mean that in this scenario, if HAMi schedules the fourth pod, you expect the status to be Pending instead of ContainerStatusUnknown?
Because there is no node available for scheduling for this POD at this time, the ip-172-30.xxxx-us-west-2-compute-internal node should be directly filtered out and not participate in the filter step
According to the test situation described, single node single GPU, --device-split-count=3
, I created four pods, and finally one pod was Pending. Is this within expectations?
[root@controller-node-1 ~]# kubectl get po -o wide| grep 'test-vgpu'
test-vgpu-5b87958dd-2t7xg 1/1 Running 0 18m 10.233.74.118 controller-node-1 <none> <none>
test-vgpu-5b87958dd-btd9q 1/1 Running 0 18m 10.233.74.116 controller-node-1 <none> <none>
test-vgpu-5b87958dd-cllkw 0/1 Pending 0 96s <none> <none> <none> <none>
test-vgpu-5b87958dd-v6grx 1/1 Running 0 18m 10.233.74.87 controller-node-1 <none> <none>
decribe the pending pod
[root@controller-node-1 ~]# kubectl describe po test-vgpu-5b87958dd-cllkw
Name: test-vgpu-5b87958dd-cllkw
Namespace: default
Priority: 0
Service Account: default
Node: <none>
Labels: app=test-vgpu
pod-template-hash=5b87958dd
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/test-vgpu-5b87958dd
Containers:
container-1:
Image: docker.samzong.me/chrstnhntschl/gpu_burn
Port: <none>
Host Port: <none>
Args:
4000
Limits:
cpu: 250m
memory: 512Mi
nvidia.com/gpucores: 10
nvidia.com/gpumem: 2k
nvidia.com/vgpu: 1
Requests:
cpu: 250m
memory: 512Mi
nvidia.com/gpucores: 10
nvidia.com/gpumem: 2k
nvidia.com/vgpu: 1
Environment:
CUDA_TASK_PRIORITY: 0
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8f2xq (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-8f2xq:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m28s hami-scheduler 0/2 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/unreachable: }. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling.
Warning FilteringFailed 4m28s hami-scheduler no available node, all node scores do not meet
Thanks for looking into it. Looks like for you it's working as expected. I wanted my fourth pod to be in Pending state. Do I need to tweak something in hami-scheduler?
in my case, I am using nvidia.com/gpu
as per examples here. I see nvidia.com/vgpu
in your case, but that could just be device plugin config difference
Can you paste the pod yaml
Another question, how do you install HAMi? A new one or upgrade from other version
This is a fresh install for a POC, Here is the helm values file I used values.yaml.txt Here is the deployed yaml hami.yaml.txt Here is the pod spec test_workload_rajeesh_test.yml.txt
I noticed this bit in the scheduler config is it because of that?
This is a fresh install for a POC, Here is the helm values file I used values.yaml.txt Here is the deployed yaml hami.yaml.txt Here is the pod spec test_workload_rajeesh_test.yml.txt
I noticed this bit in the scheduler config is it because of that?
Everything seems to be normal, and there is nothing to pay attention to. You can try the following steps
Restart hami-scheduler and prepare to capture logs
Reduce the number of copies of your application to 0, and then expand it to 4
If you find a pod with ContainerStatusUnknown, remember to upload the hami-scheduler extender container log
Delete some information about hami in the node annotation, restart hami device-plugin, retry step 2, and observe whether it is normal
Sorry for the confusion, trying this on a brand new node fixed the issue. Looks like something was in a weird state in my old node. Thanks for having a look!
Please provide an in-depth description of the question you have: I was experimenting with Nvidia g4dn.xlarge instance type with HAMi device plugin Daemonset with these options set
The scheduler extension config is
I expected 3 GPU pods to get scheduled on the 1GPU node. 4th one is failing as expected, but my issue is, the scheduler is filtering the same node which has exhausted the vGPU devices. And we get error from
kubelet
.I was expecting the
hami-scheduler
to not to filter this node and put the pod toPending
state.Here is resources listed in the node.
Is there something we need to do to get to that behavior? Thanks!
What do you think about this question?:
Environment: