Wrong Calculate for 1 pod 2+ container with gpus

What happened: when 1 pod with 2+ container request gpus, wrong score.device used resources calculated.

assume we have 2 gpu cards, each one has 10Gi memory, there is no task now. 1 pod with 2+ containers, and each container request a gpu (let's say it memory) container1 request 10Gi memory container2 request 1Gi memory

we want this result: container1 got card1 container2 got card2

wrong result now: container1 got card1 container2 got card1

root reason: there is a sort function in fitInDevices function, which will change the order of node.Devices but the result return by fitInCertainDevice records the device index before sorted, so after fitInCertainDevice, we calculate used, usedcores, usedmem, then add them to a wrong device.

What you expected to happen: should got correct device when 1 pod with 2+ containers

How to reproduce it (as minimally and precisely as possible): we metioned above

Anything else we need to know?:

The output of nvidia-smi -a on your host
Your docker or containerd configuration file (e.g: /etc/docker/daemon.json)
The hami-device-plugin container logs
The hami-scheduler container logs
The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)
Any relevant kernel output lines from dmesg

Environment:

HAMi version:
nvidia driver or other AI device driver version:
Docker version from docker version
Docker command, image and tag used
Kernel version from uname -a
Others:

Project-HAMi / HAMi

Wrong Calculate for 1 pod 2+ container with gpus #592