NVIDIA / libnvidia-container

NVIDIA container runtime library
Apache License 2.0
846 stars 205 forks source link

demo in readme not work #196

Open idreamerhx opened 2 years ago

idreamerhx commented 2 years ago

My os: archlinux latest

extra/egl-wayland 2:1.1.11-2 [installed] extra/libvdpau 1.5-1 [installed] extra/libxnvctrl 520.56.06-1 [installed] extra/nvidia-dkms 520.56.06-2 [installed] extra/nvidia-prime 1.0-4 [installed] extra/nvidia-settings 520.56.06-1 [installed] extra/nvidia-utils 520.56.06-2 [installed] extra/opencl-nvidia 520.56.06-2 [installed] community/cuda 11.8.0-1 [installed] community/cuda-tools 11.8.0-1 [installed] community/cudnn 8.5.0.96-1 [installed] community/nccl 2.14.3-1 [installed] community/nvtop 3.0.0-1 [installed] archlinuxcn/libnvidia-container 1.9.0-1 [installed] archlinuxcn/libnvidia-container-tools 1.9.0-1 [installed] archlinuxcn/nvidia-container-toolkit 1.9.0-1 [installed]

NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: 11.8

run the demo in readme.md failed:

Setup a new set of namespaces

cd $(mktemp -d) && mkdir rootfs sudo unshare --mount --pid --fork

Setup a rootfs based on Ubuntu 16.04 inside the new namespaces

curl http://cdimage.ubuntu.com/ubuntu-base/releases/16.04/release/ubuntu-base-16.04.6-base-amd64.tar.gz | tar -C rootfs -xz useradd -R $(realpath rootfs) -U -u 1000 -s /bin/bash nvidia mount --bind rootfs rootfs mount --make-private rootfs cd rootfs

Mount standard filesystems

mount -t proc none proc mount -t sysfs none sys mount -t tmpfs none tmp mount -t tmpfs none run

Isolate the first GPU device along with basic utilities

nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --no-cgroups --utility --device 0 $(pwd)

--ldconfig=@/sbin/ldconfig.real I removed this

Change into the new rootfs

pivot_root . mnt # this not work umount -l mnt exec chroot --userspec 1000:1000 . env -i bash

Run nvidia-smi from within the container

nvidia-smi -L

in chrooted rootfs nvidia-smi show nothing.

idreamerhx commented 2 years ago

I tried:

arch-chroot ubu22x86-base

and in another shell cd ubu22x86-base and nvidia-container-cli --load-kmods configure --no-cgroups --utility --device 0 $(pwd)

nvidia-smi works

but a simple program cudaGetDeviceCount returns error code 35.

idreamerhx commented 2 years ago

in chrooted rootfs

NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: N/A

I installed cuda toolkit cuda_11.8.0_520.61.05_linux.run

is there any document about nvidia-container-cli

idreamerhx commented 2 years ago

in chrooted ubuntu 22 apt install nvidia-cuda-toolkit

NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: 11.6

idreamerhx commented 2 years ago

hey, guys would you please add some document really simple such as : nvidia-container-cli --load-kmods configure --no-cgroups --utility --compute --device 0 $(pwd)

this works to show cuda version。

idreamerhx commented 2 years ago

or just add utility compute in --usage or --help

idreamerhx commented 2 years ago

there is another issue. when exit chrooted rootfs,

umount: /mnt/data1/chroots/ubu22x86-cuda118/dev: target is busy. umount: /mnt/data1/chroots/ubu22x86-cuda118/proc: target is busy.

should I clearup myself/?