Closed reedacus25 closed 1 year ago
Hey @reedacus25
thank you very much for the bug-report, it's exactly what i need to know where the issue is 👍 All i'll ask of you now is for a little bit patience (and apologise for the delayed response) since i'm in the middle of a move and if @timeu won't be able to find time to tackle this before, i'll start the work on this as soon as the move is finished and i'm settled in (hopefully by next week).
leaving this open...
Hey @reedacus25, just to let you know, i'm back online after the move and hopefully will be able to clean up this issue soon.
@reedacus25 can you check the artifact from this build and let me know if it's all working as expected now? I've done some synthetic testing locally, but unfortunately don't have a heterogeneous gpu cluster at my disposal to try it live.
https://github.com/CLIP-HPC/SlurmCommander/actions/runs/4134059883
@pja237 From a quick cursory glance, it looks to be working as expected, with the sum of the gpus showing the correct count in the cluster tab, as well as both sets of gpu's showing in the utilization bars at the top.
CPU used/total: 0/88 GPU p100: used/total: 0/6
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0% ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0%
MEM used/total: 0/768000 GPU rtx: used/total: 0/2
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0% ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0%
Hey, glad to hear. I'll do the release later today. I've updated the pr with some more testing, fixed a small issue and took the liberty to add you to the contributors list: https://github.com/CLIP-HPC/SlurmCommander/blob/ee4dd407d99f45408af50715c42e67cefb64618f/internal/model/view.go#L50-L58 Hope that's ok, would you like me to update it with your name before i merge and release or is this fine with you?
I have a host that has multiple GPUs of different models.
scom (v1.0.4/21cee5ddc47eaad02dbdc37809f38085e194e6bf) (and also previously 1.0.0), reports as only 2 GPUs for this system.
If I go to the cluster tab and go to the node and pull up the statistics, the below are the reported stats for that node
Hopefully thats helpful. Appreciate the great tool!