Closed Fizzbb closed 2 years ago
mount /sys/fs/cgroup/
from host to container causing the difference, may overwrite container's own cgroup info
the required cgroup.process ID info is not retrieved correctly. So the gpu utilization by process did not get the real gpu utils.
the issue must be solved. manual installation requires too much work from the user side. 1)install nvidia tool kit, 2) install go lang, 3) copy paste .so and 4) launch vgpu server, and device plugin process on each gpu node....
current plan: 1) adding debug message to check the process id obtained from "/var/lib/alnair/workspace/cgroup.procs" in two different set up. Verify the process id is wrong in the containerized vgpu server and user container 2) change the /sys/fs/cgroup/ mounting point in the vgpu server 3) verify vgpu server can get the container process id correctly and user container load it correctly in the file.
confirmed that through mounting /sys/fs/cgroup, cgroup.procs file is there, but the file is empty, no process id visible in the container, which is unlike in the host. Mount to different location won't solve this problem.
this is mounting kind of created a nested docker hierarchy, which may not make sense. could search docker in docker more.
however, current solution is switched to mount docker socket, and ask process id through docker top <containerID>
to obtain all the pid in the container.
close by #130 in #130 also add container ID parsing support from cgroupfs. Now both cgroupfs and systemd driver are supported.
objdump
,compiler
, andMakefile
specifications to make sure all users can build the same .so file.gpu utils by process
time interval
fill rate
cuda call token consumptions