NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
2.21k stars 241 forks source link

WSL2 + Docker + OpenGL + NVIDIA not working (uses llvmpipe) #288

Open riv-robot opened 2 years ago

riv-robot commented 2 years ago

Summary

I am running ROS GUI applications like RViz and Gazebo through a docker container on WSL2. The OpenGL renderer is not selecting my NVIDIA GTX 1050 card and uses llvmpipe (CPU) instead.

My system:

Note "latest" refers to 07th October 2021 updates, I don't have versions numbers to hand

Steps taken to fix so far

The OpenGL renderer does find my NVIDIA card outside of a docker container on WSL2 (on the host). I have replicated the same issue after multiple reinstalls and using docker-ce instead of docker desktop. On a native Ubuntu 20.04 boot, the containers OpenGL renderer is correctly set to my NVIDIA card.

Expected Behaviour

RViz, Gazebo, GLXGears, glmark2 should all render with 3D hardware acceleration on the NVIDIA GPU.

elezar commented 2 years ago

@robertjbush which image is being used? Note that for OpenGL capabilities, the NVIDIA_DRIVER_CAPABILITIES environment variable should include graphics or be set to all. See https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html#driver-capabilities

riv-robot commented 2 years ago

@elezar I believe I have tried those steps because:

  1. The image is a custom one, based on the ros-noetic image.
  2. The NVIDIA card is used for OpenGL rendering in the docker container on a native Ubuntu 20.04 install
  3. However it doesn't work with the same image on WSL2
  4. I have tried various NVIDIA images and running glxgears, glxinfo and glmark2
  5. NVIDIA_DRIVER_CAPABILITIES is set as you suggested

Are you, or anyone else within NVIDIA corporation, successfully running NVIDIA OpenGL rendering within a docker container on a WSL2 host?

elezar commented 2 years ago

Hi @robertjbush thanks for the additional information.

It may be that @rboissel will be able to provide some additional insight here.

elezar commented 2 years ago

One thing to note is that the graphics libraries are mounted from the host system, meaning that these need to be installed. Do glxgears, glxinfo, or glmark2 work in "native" WSL2 using the NVIDIA card?

Could you enable the debug option in the nvidia-contianer-cli section in the /etc/nvidia-container-runtime/config.toml file by uncommenting it.

The generated /var/log/nvidia-container-toolkit.log will contain information as to which libraries are not being located in this case.

riv-robot commented 2 years ago

One thing to note is that the graphics libraries are mounted from the host system, meaning that these need to be installed. Do glxgears, glxinfo, or glmark2 work in "native" WSL2 using the NVIDIA card?

Yes they do.

I'll work on the second part of your post now.

riv-robot commented 2 years ago

@elezar I'm not getting those logs. This is my config.toml:

disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"
[nvidia-container-runtime]
debug = "/var/log/nvidia-container-runtime.log"

This is at the end of my Docker Desktop JSON file

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
EOF

Some version info:

Can you provide any insight on this:

Are you, or anyone else within NVIDIA corporation, successfully running NVIDIA OpenGL rendering within a docker container on a WSL2 host?

elezar commented 2 years ago

@robertjbush what command line are you using to launch the container? Since nvidia is not set as the default runtime in your docker config, you would need to specify the runtime:

docker run --rm -ti --runtime=nvidia <image> nvidia-smi

Alternatively, specifying the --gpus flag should also ensure that the nvidia-container-toolkit is used to make the required modifications to the container when it is created.

While looking for documentation w.r.t. WSL support, I also found: https://docs.nvidia.com/cuda/wsl-user-guide/index.html#features-not-yet-supported which lists OpenGL-interop as unsupported.

riv-robot commented 2 years ago

@elezar I've used the runtime and --gpus flag with no success.

While looking for documentation w.r.t. WSL support, I also found: https://docs.nvidia.com/cuda/wsl-user-guide/index.html#features-not-yet-supported which lists OpenGL-interop as unsupported.

But OpenGL is used by so many applications. Why would this not be supported when it is available on the host?

riv-robot commented 2 years ago

@elezar Is there a forum to request new features?

elezar commented 2 years ago

@robertjbush let me ping someone to fine out where that limitation comes from as it may be related to WSL2 (although I recall reading that this now has better support for Linux graphics applications). If this is only due to the NVIDIA Container Toolkit I will create a ticket to track getting this added.

riv-robot commented 2 years ago

@elezar WSL2 does indeed have better support for GPU graphics rendering. I can run OpenGL applications and use NVIDIA hardware to render them. But it isn't possible from a docker container when the host is WSL2 (the same container does use the NVIDIA GPU for rendering on a pure Ubuntu 20.04 install).

elezar commented 2 years ago

I have pinged @rboissel to have a look at the ticket. He has a better grasp on the WSL2 specifics and where the noted limitations come from.

riv-robot commented 2 years ago

@elezar @rboissel Good news in part:

  1. I've been testing accelerated OpenGL through containers in WSL2. I used the dockerfile from microsoft's recent commit ac6221b.
  2. I also managed to get RViz and ROS (robotic operating system) to use accelerated OpenGL.
  3. However, the meshes (STL's) do not display when using the nvidia drivers

Any ideas why this may happen?

bejota commented 2 years ago

@robertjbush I'm having the same issue. GPU is working for compute in a docker container but not for OpenGL. I've tried environment variables such as LIBGL_ALWAYS_INDIRECT and NVIDIA_DRIVER_CAPABILITIES without success. I've also tried the dockerfiles from ac6221b. Were any other changes required to enable the GPU for graphics?

System Specs:

Thanks.

onomatopellan commented 2 years ago

@bejota OpenGL acceleration in WSL2 only works in Windows 11.

bejota commented 2 years ago

Dang. I knew someone was going to say that.

riv-robot commented 2 years ago

Anyone tested RViz and meshes using accelerated OpenGL in WSL?

tgaspar commented 2 years ago

Anyone tested RViz and meshes using accelerated OpenGL in WSL?

I am facing the exact same problem except I am not running the ROS stuff (or Rviz) from a container (I hope this is still relevant therefore).

Like many people before me, I had the issue that the 3D rendering was not done by the GPU. That meant that Rviz got very slow once the models got bit bigger. However, at that time the meshes were displaying.

So I upgraded to Win11 and did all the necessary to force the 3D rendering on the GPU (Nvidia GTX 1050 Ti). The GPU now does the rendering, except the meshes do not get displayed. The frames from TF, on the other hand, do get displayed. image

riv-robot commented 2 years ago

@tgaspar @elezar I have this exact problem.

riv-robot commented 2 years ago

Friendly ping to anyone who's had this problem and solved it?

onomatopellan commented 2 years ago

Issue is being tracked in https://github.com/microsoft/wslg/issues/554

moracabanas commented 2 years ago

@bejota OpenGL acceleration in WSL2 only works in Windows 11.

I'm on windows 11 but I am trying to run full hardware accelerated apps from Docker.

GPU (rtx2060 max Q) is working on docker containers for compute. But im sure GUI apps are not hardware accelerated in some way.

I am facing the same issue where things like webgl are not working because glrenderer is set to llvmpipe

glxgears outputs +600fps

WSL2 Ubuntu 20.04 glxinfo | grep OpenGL

OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce RTX 2060 with Max-Q Design)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 21.0.3
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 21.0.3
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 21.0.3
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:

Docker container glxinfo | grep OpenGL

OpenGL vendor string: VMware, Inc.
OpenGL renderer string: llvmpipe (LLVM 7.0, 128 bits)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 18.3.6
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 18.3.6
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 18.3.6
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:

This is my script to run gpu accelerated containers

docker run -it --rm --gpus 'all,"capabilities=compute,graphics,utility,video,display"' --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-e DISPLAY \
-e WAYLAND_DISPLAY \
-e XDG_RUNTIME_DIR \
-e PULSE_SERVER \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v /mnt/wslg:/mnt/wslg \
-v $(pwd)/app:/app \
registry/image \
command

If I play 60FPS video on youtube chromium it plays good but sometimes choppy and GPU load for nvidia is going up only displaying its window on external monitor. I am pretty sure it is CPU rendering due to high CPU load when video playing.

Trying any webgl content reports the next error image

onomatopellan commented 2 years ago

@moracabanas I think you are missing these:

-e LD_LIBRARY_PATH=/usr/lib/wsl/lib
-v /usr/lib/wsl:/usr/lib/wsl

Take a look at the samples.

moracabanas commented 2 years ago

@moracabanas I think you are missing these:

-e LD_LIBRARY_PATH=/usr/lib/wsl/lib
-v /usr/lib/wsl:/usr/lib/wsl

Take a look at the samples.

Thanks you for your suggestion. I tried the new configuration based on WLSG docker run ... examples you mentioned.

But I am still not getting OpenGL as glxinfo | grep OpenGL shows:

glxinfo | grep OpenGL
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: llvmpipe (LLVM 7.0, 128 bits)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 18.3.6
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 18.3.6
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 18.3.6
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:

Chrome is still showing the same unsupported and blacklisted WebGL

I tried Blender and it runs fine but you can feel there is no GPU acceleration at all

This is my image launcher script for testing now:

docker run -it --rm --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v /mnt/wslg:/mnt/wslg \
-v /usr/lib/wsl:/usr/lib/wsl \
--device=/dev/dxg \
-e LD_LIBRARY_PATH=/usr/lib/wsl/lib \
-e DISPLAY=$DISPLAY \
-e WAYLAND_DISPLAY=$WAYLAND_DISPLAY \
-e XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR \
-e PULSE_SERVER=$PULSE_SERVER \
-v $(pwd)/app:/app \
<repo/image:tag> \
bash
onomatopellan commented 2 years ago

@moracabanas

Mesa 18.3.6

You also need to install Mesa 21.x inside the container.

moracabanas commented 2 years ago

@moracabanas

Mesa 18.3.6

You also need to install Mesa 21.x inside the container.

I've been trying to install mesa for hours on somewhere other than Ubuntu distro and I give up.

Do you have any advice to update or install mesa I.E any docker image? I don't want it to compile because in my experience, compiling software from source takes a day, mostly with errors. And also I don't know what I am doing in the process except copy pasting scripts.

Things I've tried already:

 sudo add-apt-repository ppa:kisak/kisak-mesa
sudo apt update
sudo apt upgrade

This is not working as this repo only supports Ubuntu and has no candidate for my buster/bullseye Debian based docker image.

onomatopellan commented 2 years ago

@moracabanas On Debian bullseye you need to add the deb http://http.us.debian.org/debian/ testing non-free contrib main line to your /etc/apt/sources.list and run sudo apt update && sudo apt upgrade -y after that.

moracabanas commented 2 years ago

I updated my image with that and now I get: glxinfo | grep OpenGL

OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce RTX 2060 with Max-Q Design)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 21.2.5
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 21.2.5
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 21.2.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:

Thanks you so much I am testing this now!

moracabanas commented 2 years ago

All working like expected right now Webgl is working solid! on my docker image

image

The weird issue now is about how I can get ~700fps on glxgears with llvmpipe and just ~70fps with mesa 21.x

onomatopellan commented 2 years ago

@moracabanas glxgears is somewhat outdated. Try better es2gears from the mesa-utils-extra package.

rosiakpiotr commented 1 year ago

Any updates on this one?

kryptoniancode commented 1 year ago

How to solve this? To get GPU OpenGL renderer in docker container?

In WSL2

$ glxinfo | grep "OpenGL"
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce GTX 1050 Ti)
OpenGL core profile version string: 4.2 (Core Profile) Mesa 23.0.2 - kisak-mesa PPA
OpenGL core profile shading language version string: 4.20
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.2 (Compatibility Profile) Mesa 23.0.2 - kisak-mesa PPA
OpenGL shading language version string: 4.20
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 23.0.2 - kisak-mesa PPA
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
OpenGL ES profile extensions:

In docker container

$ glxinfo | grep "OpenGL"
OpenGL vendor string: Mesa
OpenGL renderer string: llvmpipe (LLVM 15.0.7, 256 bits)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 23.0.2 - kisak-mesa PPA
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.5 (Compatibility Profile) Mesa 23.0.2 - kisak-mesa PPA
OpenGL shading language version string: 4.50
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 23.0.2 - kisak-mesa PPA
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:
darkopetrovic commented 5 months ago

I successfully updated MESA from 20.3.5 to 22.0.5 in docker container and is now able to detect the GPU card.

DISPLAY variable is set to :0 in wsl and container.

WSL

$ glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (NVIDIA GeForce GTX 1660) (0xffffffff)
    Version: 23.2.1
    Accelerated: yes
    Video memory: 22321MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.2
    Max compat profile version: 4.2
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce GTX 1660)
OpenGL core profile version string: 4.2 (Core Profile) Mesa 23.2.1-1ubuntu3.1~22.04.2
OpenGL core profile shading language version string: 4.20
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.2 (Compatibility Profile) Mesa 23.2.1-1ubuntu3.1~22.04.2
OpenGL shading language version string: 4.20
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.1 Mesa 23.2.1-1ubuntu3.1~22.04.2
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

Docker (before update)

$ glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Mesa/X.org (0xffffffff)
    Device: llvmpipe (LLVM 11.0.1, 256 bits) (0xffffffff)
    Version: 20.3.5
    Accelerated: no
    Video memory: 20006MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.5
    Max compat profile version: 3.1
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
OpenGL vendor string: Mesa/X.org
OpenGL renderer string: llvmpipe (LLVM 11.0.1, 256 bits)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 20.3.5
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.1 Mesa 20.3.5
OpenGL shading language version string: 1.40
OpenGL context flags: (none)

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 20.3.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

I followed the guide here to update MESA. Mesa updated from 20.3.5 to 22.0.5 and is now able to detect my Nvidia card.

Docker (after update)

glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (NVIDIA GeForce GTX 1660) (0xffffffff)
    Version: 22.0.5
    Accelerated: yes
    Video memory: 22321MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 3.3
    Max compat profile version: 3.3
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (NVIDIA GeForce GTX 1660)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 22.0.5
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.3 (Compatibility Profile) Mesa 22.0.5
OpenGL shading language version string: 3.30
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.0.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10

I don't know that must be taken in account but I followed equally the guide here to enable WSLg in the container.

Otherwise I have equally the following variables in my Dockerfile:

ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES all
ENV LD_LIBRARY_PATH=/usr/lib/wsl/lib
ENV LIBVA_DRIVER_NAME=d3d12