Project-HAMi / volcano-vgpu-device-plugin

Device-plugin for volcano vgpu which support hard resource isolation
Apache License 2.0
44 stars 14 forks source link

Fix get pending pod failed #22

Closed TymonLee closed 1 month ago

TymonLee commented 2 months ago

Problem: When there are many Pods were scheduled to this node at the same time, it's possible that the function GetPendingPod() returns Pod not the one which kubelet wants to allocate vGPUs for. And finally, the allocated GPU resources is not correct.

Fix: Get the oldest Pod in terms of "vgpu-time", so that the returned Pod is always the one which kubelet wants to allocate vGPUs for.

archlitchi commented 1 month ago

thanks, that's indeed helpful

archlitchi commented 1 month ago

/lgtm