AliyunContainerService / gpushare-scheduler-extender

GPU Sharing Scheduler for Kubernetes Cluster
Apache License 2.0
1.4k stars 308 forks source link

The problem about kubectl inspect gpushare #77

Open llt19903767731 opened 4 years ago

llt19903767731 commented 4 years ago

in the cluster node1 ,the total GPU Memory is 1997Mib(GPU0:Quadro P620)+6072Mib(GPU1:GeForce GTX 1060),following like this:

| NVIDIA-SMI 384.130 Driver Version: 384.130 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro P620 Off | 00000000:02:00.0 On | N/A | | 34% 32C P8 ERR! / N/A | 168MiB / 1997MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 106... Off | 00000000:17:00.0 Off | N/A | | 20% 31C P8 5W / 120W | 2MiB / 6072MiB | 0% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 2157 G /usr/lib/xorg/Xorg 94MiB | | 0 6768 G compiz 62MiB | | 0 7184 G fcitx-qimpanel 6MiB | | 0 8539 G /usr/lib/firefox/firefox 1MiB | | 0 8759 G /usr/lib/firefox/firefox 1MiB | +-----------------------------------------------------------------------------+

but when i excute the command :sudo kubectl inspect gpushare, the output is : NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU Memory(MiB) compute1 192.168.1.3 0/1997 0/1997 0/3994

Allocated/Total GPU Memory In Cluster: 0/3994 (0%)

why the GPU1 only has 1997Mib ,does the output of GPU1 has some realtions to the output of GPU0 ? Can someone explain this problem? thank you!!

juchaosong commented 4 years ago

You cannot use different gpu in one node, it will cause inconsistent infomation with actual gpu. As you see, all gpu information will same with the first.