Wrong process ID inside docker

vio1etus commented 2 years ago

I manually build nvtop in my Ubuntu 20.04 docker container, becuase the driver problem in the pinned issue. However, I found the pid it's showed it's wrong, because I cannot find the pid using ps:

find pid in ps -aux

vio1etus commented 2 years ago

No, both of them are inside the container. I use a docker container assigned by the host machine, where I don't have access

Syllo commented 2 years ago

What I think is happening is that the NVIDIA driver returns the PID on the host which is different than the PID inside the container.

Can you pass options to docker run? Adding "--pid=host" should make it work.

vio1etus commented 2 years ago

Thank you very much.

It does sound like a solution to this problem. However, the behavior of adding --pid=host with docker run maybe pose potential threats for the host machine, which might be rejected/refused by the administor.

So can you kindly find some other way to make it? If it's too tricky, feel free to close this issue. Thanks

Syllo commented 2 years ago

I'm not sure if the PID mapping can be retrieved from within the container. There seem to be the same problem for nvidia-smi: https://github.com/NVIDIA/nvidia-docker/issues/1460.

I'll search if docker or the kernel exposes this info in the container.

vio1etus commented 2 years ago

Actually, It just because the nvidia-smi problem in container, I searched and found nvtop. 😂 OKOK. Waiting for your good news.

Best regard.

在 2022年5月31日，17:48，Maxime Schmitt @.***> 写道：

I'm not sure if it can be retrieved from within the container. There seem to be the same problem for nvidia-smi: NVIDIA/nvidia-docker#1460.

I'll search if docker or the kernel exposes this info in the container.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

zhuyifei1999 commented 2 years ago

I'll search if docker or the kernel exposes this info in the container.

Might be difficult. PIDs of the host are considered an info leak last time I asked

There's an guide from Oracle which mentions two things that caught my eye, the the /proc/pid/status, and a translate_pid syscall.

/proc/pid/status seems aware that this is in a pidns and would only show up until the pidns of the procfs mount. You would need a procfs mount from the host pidns to perform this translation:

zhuyifei1999@zhuyifei1999-ThinkPad-P14s-Gen-2a ~ $ unshare -r -p -f
zhuyifei1999-ThinkPad-P14s-Gen-2a ~ # cat /proc/self/status | grep -i pid
Pid:    193800
PPid:   193777
TracerPid:  0
NSpid:  193800  22
zhuyifei1999-ThinkPad-P14s-Gen-2a ~ # 
logout
zhuyifei1999@zhuyifei1999-ThinkPad-P14s-Gen-2a ~ $ unshare -r -p -f --mount-proc
zhuyifei1999-ThinkPad-P14s-Gen-2a ~ # cat /proc/self/status | grep -i pid
Pid:    22
PPid:   1
TracerPid:  0
NSpid:  22

The translate_pid syscall didn't seem to get anywhere.

Other methods might include somehow obtaining a pidfd and use its fdinfo to retrieve the PID in the procfs's pidns (https://github.com/strace/strace/commit/4aebde97de3bbac3a198cadad7a4b2d4fe3028e8), or somehow getting the host to send SCM_CREDENTIALS or hold a System V semaphore or a file lock or something....

Unless docker provides a way to do this but I haven't found anything

zhuyifei1999 commented 2 years ago

The discussion in https://github.com/NVIDIA/nvidia-docker/issues/179 lists two kernel-mode workarounds:

Syllo commented 2 years ago

Thank you, @zhuyifei1999. @vio1etus it may be worth a try to ask your administrator to implement one of the workarounds. Even though they are kernel modules, they are pretty small and can easily be audited.

vio1etus commented 2 years ago

Thank you, @zhuyifei1999. @vio1etus it may be worth a try to ask your administrator to implement one of the workarounds. Even though they are kernel modules, they are pretty small and can easily be audited.

Got it. Thank you very much!

Syllo / nvtop

Wrong process ID inside docker #149