Closed vio1etus closed 2 years ago
No, both of them are inside the container. I use a docker container assigned by the host machine, where I don't have access
What I think is happening is that the NVIDIA driver returns the PID on the host which is different than the PID inside the container.
Can you pass options to docker run? Adding "--pid=host" should make it work.
Thank you very much.
It does sound like a solution to this problem. However, the behavior of adding --pid=host
with docker run
maybe pose potential threats for the host machine, which might be rejected/refused by the administor.
So can you kindly find some other way to make it? If it's too tricky, feel free to close this issue. Thanks
I'm not sure if the PID mapping can be retrieved from within the container. There seem to be the same problem for nvidia-smi: https://github.com/NVIDIA/nvidia-docker/issues/1460.
I'll search if docker or the kernel exposes this info in the container.
Actually, It just because the nvidia-smi problem in container, I searched and found nvtop. 😂 OKOK. Waiting for your good news.
Best regard.
在 2022年5月31日,17:48,Maxime Schmitt @.***> 写道:
I'm not sure if it can be retrieved from within the container. There seem to be the same problem for nvidia-smi: NVIDIA/nvidia-docker#1460.
I'll search if docker or the kernel exposes this info in the container.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.
I'll search if docker or the kernel exposes this info in the container.
Might be difficult. PIDs of the host are considered an info leak last time I asked
There's an guide from Oracle which mentions two things that caught my eye, the the /proc/pid/status, and a translate_pid
syscall.
/proc/pid/status
seems aware that this is in a pidns and would only show up until the pidns of the procfs mount. You would need a procfs mount from the host pidns to perform this translation:
zhuyifei1999@zhuyifei1999-ThinkPad-P14s-Gen-2a ~ $ unshare -r -p -f
zhuyifei1999-ThinkPad-P14s-Gen-2a ~ # cat /proc/self/status | grep -i pid
Pid: 193800
PPid: 193777
TracerPid: 0
NSpid: 193800 22
zhuyifei1999-ThinkPad-P14s-Gen-2a ~ #
logout
zhuyifei1999@zhuyifei1999-ThinkPad-P14s-Gen-2a ~ $ unshare -r -p -f --mount-proc
zhuyifei1999-ThinkPad-P14s-Gen-2a ~ # cat /proc/self/status | grep -i pid
Pid: 22
PPid: 1
TracerPid: 0
NSpid: 22
The translate_pid
syscall didn't seem to get anywhere.
Other methods might include somehow obtaining a pidfd and use its fdinfo to retrieve the PID in the procfs's pidns (https://github.com/strace/strace/commit/4aebde97de3bbac3a198cadad7a4b2d4fe3028e8), or somehow getting the host to send SCM_CREDENTIALS or hold a System V semaphore or a file lock or something....
Unless docker provides a way to do this but I haven't found anything
The discussion in https://github.com/NVIDIA/nvidia-docker/issues/179 lists two kernel-mode workarounds:
Thank you, @zhuyifei1999. @vio1etus it may be worth a try to ask your administrator to implement one of the workarounds. Even though they are kernel modules, they are pretty small and can easily be audited.
Thank you, @zhuyifei1999. @vio1etus it may be worth a try to ask your administrator to implement one of the workarounds. Even though they are kernel modules, they are pretty small and can easily be audited.
Got it. Thank you very much!
I manually build nvtop in my Ubuntu 20.04 docker container, becuase the driver problem in the pinned issue. However, I found the pid it's showed it's wrong, because I cannot find the pid using
ps
:find pid in
ps -aux