Closed robertsulej closed 4 years ago
Sorry this never got answered. From what I understand (though I haven't looked into it very deeply), you will need to mount libnvoptix.so.X and libnvidia-rtcore.so.X from the hist to the container.
Unfortunately extending support for Optix into container is a bit further down the roadmap and hence won't get tackled natively for a few months.
Well... that works!
Someone from the OptiX forum already tried copying files to docker, but missed libnvidia-rtcore.so.X.
I just mounted all the files you mention and the device query sample works fine. I need to point manually to the exact driver version, but for the moment it is perfectly enough. If I run into troubles with a more sophisticated app, I'll be back.
Thanks! Robert
Thought the recent major changes with how nvidia-docker interacts with Docker 19.03, nvidia-container-runtime 3.1, the proprietary driver 430, etc. might have addressed this, but it is still an issue.
Things are moving forward.. In the new OptiX 7 all the OptiX symbols (and also cuDNN for AI denoiser) are moved to the driver. I did not try yet if @RenaudWasTaken solution will work and which driver files need to be mounted. Just letting you know there are major changes.
With libnvidia-container1 version 1.0.4 (or newer) I added an experimental support for this.
Experimental because I really just mounted the two libraries without testing or looking into what more might be required.
Feel free to test and give me feedback :)
Thanks! I'll try and let you know.
Hi @RenaudWasTaken, Thanks for your work! I'm having libnvidia-container1 == 1.0.5
, though. libnvoptix.so
and libnvidia-rtcore.so
are still not mounted into the container automatically. Do I need to turn the behavior on with any flags?
You can try it with the environment variable NVIDIA_DRIVER_CAPABILITIES
set to graphics
I tried it but no luck. This was the command executed NVIDIA_DRIVER_CAPABILITIES=graphics sudo docker run -d -p 2222:22 --rm --gpus all --name test chenzhekl/test
.
and the driver version on the host:
Driver Version: 440.33.01
CUDA Version: 10.2
sudo docker run -e NVIDIA_DRIVER_CAPABILITIES=graphics --gpus all nvidia/cuda:10.0-base
My bad.. Thanks for your help! Everything works now.
Closing for now as this seems to be resolved.
I am adding this for those wondering why OptiX is not working with NVIDIA_DRIVER_CAPABILITIES=graphics
# Assuming the SDK is installed in /tmp/NVIDIA and the `build` directory is within that for the `optixHello` sample
# In this case:
# Ubuntu 20.04
# Cuda 11.4
# NVIDIA 470
# This means we can only go up to Optix7.3 (due to R470)
docker run -v /tmp/NVIDIA:/tmp/NVIDIA \
-v /usr/lib/x86_64-linux-gnu/libnvoptix.so.1:/usr/lib/x86_64-linux-gnu/libnvoptix.so.1 \
-v /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.470.82.01:/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.470.82.01 \
--gpus all -it --rm nvidia/cuda:11.4.3-runtime-ubuntu20.04
If you run optixHello
-- this will work.
docker run -v /tmp/NVIDIA:/tmp/NVIDIA \
-e NVIDIA_DRIVER_CAPABILITIES=graphics,compute,utility \
--gpus all -it --rm nvidia/cuda:11.4.3-runtime-ubuntu20.04
^ At first attempt this will not work -- and most folks that are complaining about it not working are probably running int this -- you will see that you have both libs within the container:
/usr/lib/x86_64-linux-gnu/libnvoptix.so.470.82.01
/usr/lib/x86_64-linux-gnu/libnvoptix.so.1
However, looking more closely will show you that /usr/lib/x86_64-linux-gnu/libnvoptix.so.1
is 0 bytes.
The reason is because from the HOST running Docker, you have:
# cd /usr/lib/x86_64-linux-gnu/
# ls -alh | grep libnvoptix
lrwxrwxrwx 1 root root 23 Nov 16 2021 libnvoptix.so.1 -> libnvoptix.so.470.82.01
-rw-r--r-- 1 root root 161M Oct 27 2021 libnvoptix.so.470.82.01
The symlink is translating into an empty map.
The easy way to fix this within the container is:
ln -sf /usr/lib/x86_64-linux-gnu/libnvoptix.so.470.82.01 /usr/lib/x86_64-linux-gnu/libnvoptix.so.1
As soon as you do that it works:
root@0eddf5bed7c2:/tmp/NVIDIA/build/bin# ./optixHello
Caught exception: OPTIX_ERROR_LIBRARY_NOT_FOUND: Optix call 'optixInit()' failed: /tmp/NVIDIA/SDK/optixHello/optixHello.cpp:124)
root@0eddf5bed7c2:/tmp/NVIDIA/build/bin# ln -sf /usr/lib/x86_64-linux-gnu/libnvoptix.so.470.82.01 /usr/lib/x86_64-linux-gnu/libnvoptix.so.1
root@0eddf5bed7c2:/tmp/NVIDIA/build/bin# ./optixHello
[ 4][ KNOBS]: All knobs on default.
[ 4][ DISK CACHE]: Opened database: "/var/tmp/OptixCache_root/cache7.db"
[ 4][ DISK CACHE]: Cache data size: "15.9 KiB"
[ 4][ DISKCACHE]: Cache hit for key: ptx-6766-keydeb0e13958c7dc89fbcbe36c70c7e95d-sm_80-rtc0-drv470.82.01
[ 4][COMPILE FEEDBACK]:
[ 4][COMPILE FEEDBACK]: Info: Pipeline has 1 module(s), 1 entry function(s), 0 trace call(s), 0 continuation callable call(s), 0 direct callable call(s), 1 basic block(s) in entry functions, 79 instruction(s) in entry functions, 0 non-entry function(s), 0 basic block(s) in non-entry functions, 0 instruction(s) in non-entry functions
GLFW Error 65544: X11: The DISPLAY environment variable is missing
Caught exception: Failed to initialize GLFW
Hope this helps others.
1. Issue or feature description
Since Optix 6.0 a part of its libraries was moved to the GPU driver and became inaccessible in docker. Initialization of the Optix 6 context fails in docker with the error "Failed to load OptiX library", while it is working correctly in the host. The same procedure is working correctly in both, host and docker, with Optix 5.
This issue was reported on the nvidia developers forum and also reported in the nvidia-docker issues, however, people were directed to libnvidia-container support. I submitted issue there, but I am not sure if it does not fit better here.
2. Steps to reproduce the issue
I compiled an image with one of the Optix 6 SDK samples (failing) and the same with Optix 5 (running OK):
Image is configured to run Optix 6 sample:
You can run exactly the same but with Optix 5 in the interactive mode:
and inside the container:
The whole setup, including Dockerfile is available in GitHub.
Thanks for help! Robert