Project-HAMi / HAMi

Heterogeneous AI Computing Virtualization Middleware
http://project-hami.io/
Apache License 2.0
956 stars 197 forks source link

Wrong Calculate for 1 pod 2+ container with gpus #592

Closed joy717 closed 2 weeks ago

joy717 commented 2 weeks ago

What happened: when 1 pod with 2+ container request gpus, wrong score.device used resources calculated.

assume we have 2 gpu cards, each one has 10Gi memory, there is no task now. 1 pod with 2+ containers, and each container request a gpu (let's say it memory) container1 request 10Gi memory container2 request 1Gi memory

we want this result: container1 got card1 container2 got card2

wrong result now: container1 got card1 container2 got card1

root reason: there is a sort function in fitInDevices function, which will change the order of node.Devices but the result return by fitInCertainDevice records the device index before sorted, so after fitInCertainDevice, we calculate used, usedcores, usedmem, then add them to a wrong device.

What you expected to happen: should got correct device when 1 pod with 2+ containers

How to reproduce it (as minimally and precisely as possible): we metioned above

Anything else we need to know?:

Environment: