Open KapilS25 opened 3 years ago
Thanks for the report. There is a long thread on that link (some of which is relevant, some of which is not). Can you summarize the exact bug you are seeing with libnvidia-container
here.
With cgroups ,nvidia-container-cli unable to mount /dev from host to inside the containers /dev
need to use --no-devbind flag with nvidia-container-cli , which should not be a case , as mentioned by enroot developer.
https://github.com/NVIDIA/enroot/issues/54#issuecomment-762148027
I don't know anything about enroot
. Do you have a simple reproducer with nvidia-container-cli
directly that I can use to see what your issue is?
Adding @3XX0 (enroot developer) in the conversation, @3XX0 can you please explain the issue to @klueska , as i dont know how exactly enroot start is using nvidia-container-cli.
Basically it looks like the device mount fails if the device already exists at the destination. I've never seen this before, so this might be RHEL specific:
mount error: file creation failed: /scratch/pbs/enroot-data/user-613.chas052/lammps/dev/nvidia-uvm-tools: operation not permitted
/dev/nvidia-uvm-tools
already exists because /dev
is bind mounted in the container, so mount shouldn't try to create it.
@KapilS25 Can you try adding strace
to nvidia-container-cli
in the nvidia hook so we can see the exact failure on open
Please find attached output file for nvidia-container-cli with strace. dev_mount_issue_nvidia-container-cli.strace.txt
Thanks, this makes sense now, the umask will make the open fail as it tries to adjust permissions
So it sounds like this is not actually a bug in libnvidia-container
then, but rather expected behaviour given the umask
set on /dev/nvidia-uvm-tools
.
It is a bug, the file exists and can just be mounted over. Libnvidia-container shouldn't try to adjust the permission of a device file to reflect the system umask. Permissions of the underlying file actually don't really matter.
I've also encountered the same bug
As reported by enroot developer, kindly look into this : https://github.com/NVIDIA/enroot/issues/54#issuecomment-762169057