Supporting HW accelerated WebGL content or native apps (OpenGL)

accetto commented 3 years ago

This issue has been originally filed by @talent-less as the issue #5 in the Generation 1 repository accetto/ubuntu-vnc-xfce-firefox. I've moved the complete conversation here, because the Generation 1 repo is approaching its retirement and in will be probably removed in the future. Also the subject fits better this repository.

accetto commented 3 years ago

@talent-less wrote on Apr 16, 2021:

The docker image works great, but it would be even better if it can run HW accelerated WebGL content or native apps (OpenGL) , like blender, etc.

FYI: https://medium.com/@pigiuz/hw-accelerated-gui-apps-on-docker-7fd424fe813e

accetto commented 3 years ago

@accetto wrote on Apr 20, 2021:

Thanks for the hint. I'll check it.

accetto commented 3 years ago

@accetto wrote on Apr 25, 2021:

Hello @talent-less ,

I'm working on it and expect that I'll publish some experimental images soon. At the moment it seems like an image with VirtualGL and Blender would be sufficient for testing, would it?

However, I'll not be able to test it to the end, because I currently do not own any computer with Nvidia GPU. Would you agree to help with testing?

If yes, then what would be your prefered scenario and what Nvidia drivers should I pre-install for you in the test images?

Regards, accetto

accetto commented 3 years ago

@talent-less wrote on Apr 26, 2021

@accetto Sure! Blender should be sufficient.

I have a Intel integrated-graphics and a NVIDIA GTX1050. Maybe I can also test it on other machines with more powerful graphics like 2080Ti. For the driver, I would try sudo apt install nvidia-driver-460

accetto commented 3 years ago

@accetto wrote on Apr 26, 2021:

Thanks, that will help. We'll probably need a few iterations, because it's not exactly my playground, but hopefully we'll reach something. :)

Do you maybe know, what exactly the nvidia-headless-460 drivers do? Is it something exclusively for the cloud (maybe) or is it a kind of superset/subset to the nvidia-driver-460 drivers?

accetto commented 3 years ago

@talent-less wrote on Apr 27, 2021:

I guess nvidia-headless-460 is for Ubuntu without the X server. Maybe I can try both on aws. But I think that shouldn't be very problematic, as the driver is only required on the host machine.

For gl libs in the docker you can maybe refer to

https://github.com/nytimes/rd-blender-docker/blob/master/dist/2.80-gpu-ubuntu18.04/Dockerfile

https://gitlab.com/nvidia/container-images/opengl/-/blob/ubuntu20.04/glvnd/runtime/Dockerfile

accetto commented 3 years ago

@accetto wrote on Apr 27, 2021:

Thanks for the hints. I've already a few alternatives developed that seem to work. However, it would help, if you could describe the first use case, you would like to test. How do you plan to use the container? Just locally or with a cloud? Do you plan to use it with VNC or do you want to share the X11 socket? Would you prefer TurboVNC or would be TigerVNC OK?

BTW: There will be an unexpected delay, because there is an ad-hoc problem with the TigerVNC download hosting (s. this issue). I have to configure a mirror and update all my containers, because it will be gone this weekend.

accetto commented 3 years ago

@talent-less wrote on Apr 28, 2021:

I will probably use it on a cloud platform as I don't have a NVIDIA GPU on my laptop. I think it will be very useful for people who want to perform long-running tasks like blender rendering while remaining a basic level of interactivity with the GUI instead of having the full thing running in headless mode.

It would be great if everything can be streamed via noVNC, just like in your firefox project. Can it be a layer on top of the firefox project? Then I can run both firefox and other OpenGL software using that image as starting point. If that is not possible, connecting with a VNC client via TCP should also be good.

I will not use the domain socket way, and I think it is already done by other containers

accetto commented 3 years ago

@accetto wrote on Apr 28, 2021:

Thanks, let's start with this use case. I plan to prepare a few versions, with Firefox and also with Chromium. They will include noVNC and also TigerVNC, because they go together in my containers. Then I'll include Mesa3D utilities and VirtualGL. If I've understood you correctly, I should not install NVidia drivers inside the containers.

BTW, I always forget to mention it. You've filed the issue by the image accetto/ubuntu-vnc-xfce-firefox, which is a Generation 1 image based on Ubuntu 18.04 LTS. However, I plan to release Generation 3 images, that are based on Ubuntu 20.04 LTS. They are generally slimmer and faster. I'll probably put it all into the project accetto/headless-drawing-g3. Would it be OK? Is there no show stopper for you?

accetto commented 3 years ago

@talent-less wrote on Apr 30, 2021:

Sounds great! accetto/headless-drawing-g3 should be a better place for software like blender :)

accetto commented 3 years ago

@accetto wrote on May 1, 2021:

@talent-less , FYI

For the case that you've noticed that I've already started to publish Blender images into the repository accetto/ubuntu-vnc-xfce-blender-g3 on Docker Hub. These are not yet the images we've discussed here.

Those Blender images do not include any support for OpenGL/WebGl/VirtualGL yet. You can use them later to compare the performance, for example.

However, because I have no experience with Blender, I would appreciate, if you could test if something is missing there. Generally I try to keep the images as slim as possible, so I try to include only the really required packages at first.

But, for example, I've noticed that Blender uses Python and it seems, that it also installs it itself. However, it's still possible, that something is missing there.

Btw, you can always install additional packages, because sudo is included. Do not forget to execute sudo apt-get update first. The default sudo password is headless.

accetto commented 3 years ago

@accetto wrote on May 3, 2021

I've just published the images with Blender that include Mesa3D utilities and VirtualGL. Please go to the Docker Hub repository accetto/ubuntu-vnc-xfce-blender-g3 and look for the tags starting with 'vgl-'. Note that these images are not mentioned on the Readme page, because they are still experimental.

Unfortunatelly I cannot test it myself, because I currently do not have access to any Nvidia GPUs and I'm also missing experience with Blender and VirtualGL.

Few additional remarks still:

Mesa3D utilities should support OpenGL and WebGL
Welknown testing apps glxgears (for OpenGL) and es2gears and es3tri (for WebGL) are also included
WebGL can be tested by visiting the WebGL website

I've found that VirtualGL can be used inside a container like:

vglrun -d :1 glxgears
vglrun -d :1 blender

I'm not sure if it's all what is needed, but it seems that the display argument is required. However, I hope that you already know what is needed.

The resources for the images are on GitHub in the experimental branch exp-vgl.

Please let me know if it is working and/or something is still to improve.

accetto commented 3 years ago

@latent-less wrote on May 5, 2021:

@accetto Hi, I tired sudo docker run --name blender -p 6901:6901 --gpus all --device=/dev/dri/card0:/dev/dri/card0 accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus

https://get.webgl.org/ says the browser supports WebGL, and blender can run with or without vglrun -d :1 blender over novnc. However, I didn't feel any speed up with vglrun in the blender GUI.

apt install glmark2 and vglrun -d :1 glmakr2 crashed with

=======================================================
    glmark2 2014.03+git20150611.fa71af2d
=======================================================
    OpenGL Information
    GL_VENDOR:     Mesa/X.org
    GL_RENDERER:   llvmpipe (LLVM 11.0.0, 256 bits)
    GL_VERSION:    3.1 Mesa 20.2.6
=======================================================
[build] use-vbo=false:X Error of failed request:  BadMatch (invalid parameter attributes)
  Major opcode of failed request:  131 (MIT-SHM)
  Minor opcode of failed request:  3 (X_ShmPutImage)
  Serial number of failed request:  44
  Current serial number in output stream:  45

glmakr2 without vglrun works, but it seems to use software rendering instead of the GPU.

glxinfo | grep vendor says

server glx vendor string: SGI
client glx vendor string: Mesa Project and SGI
OpenGL vendor string: Mesa/X.org

vglrun -d :1 glxinfo | grep vendor says

server glx vendor string: VirtualGL
client glx vendor string: VirtualGL
OpenGL vendor string: Mesa/X.org

but glxinfo crashed.

Then I tried sudo docker run --name blender -p 6901:6901 accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus without GPU, blender works fine, and the browser still says that I have WebGL.

apt install glmark2 and glamrk2 works .

=======================================================
    glmark2 2014.03+git20150611.fa71af2d
=======================================================
    OpenGL Information
    GL_VENDOR:     Mesa/X.org
    GL_RENDERER:   llvmpipe (LLVM 11.0.0, 256 bits)
    GL_VERSION:    3.1 Mesa 20.2.6
=======================================================

It seems that there is a software OpenGL (SGL) somewhere trying to simulate the GPU. I will try installing nvidia-driver-450-server in the container (same as on the host machine) and see if it helps.

accetto commented 3 years ago

@talent-less wrote on May 5, 2021:

It's a bit strange that even before I install any driver in the container, nvidia-smi is in the /usr/bin and it can detect the graphics card in the container without any problem. I assume the docker plugin mounts it for me. I also tried some Nvidia CUDA programs and they can run without problem. It's just opengl that still falls back to the software rendering.

Do I need install X on the host and share it into the container with -v /tmp/.X11-unix/X0:/tmp/.X11-unix/X0:rw like in https://medium.com/@benjamin.botto/opengl-and-cuda-applications-in-docker-af0eece000f1

accetto commented 3 years ago

@talent-less wrote on May 5, 2021:

I also tired sudo docker run --name blender --gpus all --device=/dev/dri/card0:/dev/dri/card0 -p 6901:6901 -v /tmp/.X11-unix/:/tmp/.X11-unix/:rw -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=graphics,utility,compute,video -d accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus but the docker image exit(0) immediately without producing any logs.

accetto commented 3 years ago

@accetto wrote on May 5, 2021:

Thanks for the feedback.

I'm still not quite sure about the configuration you use. Do you run the container on your lokal host or in a cloud? Do you try to use an NVidia card included in your host computer or do you try to use a GPU on a cloud? I've got an impression, that you've got a host with an Nvidia card and you're trying to share local devices with the container, is that right? And one more important thing - are you on Linux or on Windows?

I'm unable to repeat your scenarios, but this is what I've found when I've been experimenting on my Linux computer. Maybe it'll help.

The scenario is about running on local Linux computer and sharing X11 between the host computer and the container. A few remarks first:

Because we're sharing X11, we have to share also the display, so everything will run on the display :0 (not on :1).
Because we're sharing the display of the host, we should not start the VNC server inside the container, so we have use the starting option --skip-vnc. (However, if the VNC server is not running, then also the noVNC will not be unavailable.)
I've found, that Blender requires also the sound device, if started with vglrun.
We have also allow access to the local X server, of course.

This is what has worked for me during testing. The best approach is to use two separate terminal windows.

Start a new container in the first terminal window:

# allow access to the local X server
xhost +local:$(whoami)

# start the container
docker run -it --rm \
-v /tmp/.X11-unix:/tmp/.X11-unix:rw \
-e DISPLAY=$DISPLAY \
--device /dev/snd:/dev/snd:rw \
--group-add audio \
--device /dev/dri/card0 \
--name devrun \
--hostname devrun \             
accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc --skip-vnc

Then in the second terminal window connect to the running container and do what you need. For example:

# connect to the container
docker exec -it devrun bash

# install glmark2
sudo apt-get update
sudo apt-get install -y glmark2

# start glmark2 without vglrun
glmark2

# or with vglrun (should not crash this time)
vglrun glmark

# or blender
blender

# however, I've got crashes with blender with vglrun (maybe som more options are required)
vglrun blender

However, it can also be, that you simply need to install nvidia-docker on your host computer. Or maybe we're still missing something in the image.

Tomorrow I'll check the link you've send above.

Btw, sometimes it helps to start the containers with the --debug option, your could get some hints about the problem.

accetto commented 3 years ago

@talent-less wrote on May 6, 2021:

I was testing on a clean Ubuntu 20.10 installation with latest docker and nvida-docker plugin on a physical machine at home without any virtualization.

I tired sharing the local X into the container with novnc enabled, I guess that's why it crashed. I don't think sharing local X with novnc disabled is what I would prefer, as I would like to share the container to a remote user.

accetto commented 3 years ago

@talent-less wrote on May 6, 2021:

@accetto were you able to get opengl working with novnc? I guess glmark should be showing the correct vendor (Intel, for example) instead of Mesa

accetto commented 3 years ago

@accetto wrote on May 6, 2021:

I guess glmark should be showing the correct vendor (Intel, for example) instead of Mesa

I'm not quite sure about this. I'm afraid that the nvidia-docker would recognize and work only with Nvidia hardware (see the Note to the reader in this article).

I was testing on a clean Ubuntu 20.10 installation with latest docker and nvida-docker plugin on a physical machine at home without any virtualization.

We would need at least one system with Nvidia GPU to be able to test it all. Do you have an Nvidia GPU in your host? Alternatively, do you have an access to a cloud GPU (as descibed in this article)?

I don't think sharing local X with novnc disabled is what I would prefer, as I would like to share the container to a remote user.

I think that it would help if you could draw a diagram describing the configuration you would like to use. Then we could concentrate on that particular scenario. If you do not have a diagramming software and don't want to use an online one, then you can use the container accetto/ubuntu-vnc-xfce-drawio-g3.

accetto commented 3 years ago

@talent-less wrote on May 7, 2021:

We would need at least one system with Nvidia GPU to be able to test it all.

Sure. The Ubuntu machine has a 1050Ti Nvidia GPU.

I also tried some Nvidia CUDA programs and they can run without problem.

And I think I have got nvidia-docker plugin working, because I can run GPU program (CUDA programs) in your docker image. These program doesn't use opengl related libraries though.

I also have access to a cloud with 1080Ti GPUs, but I have to pay for using it. So I would like to test the image first on the local ubuntu machine.

However I would test as if the local machine were in the cloud, i.e. no physical display connected and display output is sent via novnc to my laptop.

accetto commented 3 years ago

@talent-less wrote on May 7, 2021:

I'm afraid that the nvidia-docker would recognize and work only with Nvidia hardware (see the Note to the reader in this article).

Sure, I know.

@accetto were you able to get opengl working with novnc? I guess glmark should be showing the correct vendor (Intel, for example) instead of Mesa

I asked this because I suspect that novnc is overriding the graphics hardware (nvidia in my case and intel in your case ) with software opengl from mesa.

accetto commented 3 years ago

@accetto wrote on May 7, 2021:

I also have access to a cloud with 1080Ti GPUs, but I have to pay for using it. So I would like to test the image first on the local ubuntu machine.

Exactly my view. :)

And I think I have got nvidia-docker plugin working, because I can run GPU program (CUDA programs) in your docker image. These program doesn't use opengl related libraries though.

That's a really good news! What has been needed to get it running? As I've already mentioned, OpenGL/Blender is not exactly my playground, unfortunatelly. And not having any Nvidia hardware at the time makes me kind of one eye blind. :) However, I'll be glad to co-develop a useful image.

Do you have already some ideas what should/could we change in the image? I'll try to get some hints from the articles we've found, but it'll take some time.

Should I include some test apps into the images, like glmark2, so you would not need to install them each time as new?

accetto commented 3 years ago

@talent-less wrote on May 7, 2021:

glmark2 should be a handy tool :)

I went through https://medium.com/@benjamin.botto/opengl-and-cuda-applications-in-docker-af0eece000f1 again, and it seems that some of libglvnd0 libgl1 libglx0 libegl1 libglvnd0 libgl1 libglx0 libegl1 are missing in the container.

According to https://gitlab.com/nvidia/container-images/opengl/-/blob/ubuntu20.04/glvnd/runtime/Dockerfile, both x86 and x64 version can be installed and libglvnd0 needs a json file

RUN apt-get update && apt-get install -y --no-install-recommends \
        libglvnd0 libglvnd0:i386 \
        libgl1 libgl1:i386 \
        libglx0 libglx0:i386 \
        libegl1 libegl1:i386 \
        libgles2 libgles2:i386 && \
    rm -rf /var/lib/apt/lists/*
COPY 10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json

Here is how glvnd works: https://www.x.org/wiki/Events/XDC2016/Program/xdc-2016-glvnd-status.pdf

I am not sure if glvnd should be installed together with Mesa. Mesa seems to ship a software OpenGL and blender/firefox seems to use it instead of the real HW at the moment.

accetto commented 3 years ago

@talent-less wrote on may 7, 2021:

I am not sure if glvnd should be installed together with Mesa. Mesa seems to ship a software OpenGL and blender/firefox seems to use it instead of the real HW at the moment.

Oh that is out-dated.

In the process of migrating to Mesa 18.0, Canonical has updated their out-of-tree Mir patches for Mesa. They are also now enabling the OpenGL Vendor Neutral Dispatch Library "GLVND" that allows multiple OpenGL drivers to happily co-exist on the same system.

https://phoronix.com/scan.php?page=news_item&px=Ubuntu-18.04-Getting-Mesa-18.0

So maybe install libglvnd0 will make it work automagically.

accetto commented 3 years ago

@accetto wrote on May 7, 2021:

Thanks, I'll check it. I'll also include glmark2. Let me know, if also some other test apps or utilities would be useful. We can remove them later if the images will become too big.

BTW, should I continue building all the tags I've published? For example, I'm not sure if you use by testing the images with chromium and the ones without noVNC.

accetto commented 3 years ago

@talent-less wrote on May 7, 2021:

I will only use the ones with Firefox and novnc. Maybe you can disable other ones for now and reenable them when the opengl issue is resolved.

accetto commented 3 years ago

@accetto wrote on May 7, 2021:

All the libraries you've listed above I've already included. You can check it by executing the following:

sudo apt-get update && \
apt search libglvnd0 ; \
apt search libgl1 ; \
apt search libglx0 ; \
apt search libegl1 ; \
apt search libgles2

Only the json file is not there because I didn't know about it. You can easily add it yourself and test it again. I'll also try to find out if something else could be missing.

Remark: Not all libraries are explicitely visible in the Dockerfile because many of them are installed as dependencies. In this case they are mostly part of the mesa stuff.

accetto commented 3 years ago

@accetto wrote on May 8, 2021:

I've just published an updated image tag for you and removed the other ones. I can build them any time if you'll need them for testing.

You can download the image by:

docker pull accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus

I've added the glmark2 test app and also the json file you've pointed to. However, I've noticed, that there already has been the file 50_mesa.json with exactly the same content. Unfortunatelly adding the file didn't solve the problem of crashing glmark2 if started with vglrun.

The json file points to the module libEGL_nvidia.so.0, which is not installed in the image. I have tried to add it by installing the package libnvidia-gl-440, but it didn't help with the crashing problem. I assume that nvidia-docker and Nvidia hardware are still required.

One more tip for experimenting.

If you want to force the container's VNC to use the display #0, you can do it by overriding the VNC paramater in runtime, as it is described here. You can override also other VNC parameters and simplify you docker run commands. Just be careful to bind a single file only, not a folder. The file content could look something like this, including the empty VNC password to avoid tipping it each time :)

export DISPLAY=:0
export VNC_PW=
# export DISPLAY=:2
# export VNC_COL_DEPTH=32
# export VNC_VIEW_ONLY=true
# export VNC_RESOLUTION=1024x768
# export VNC_PORT=5902
# export NO_VNC_PORT=6902

Then you can use, for example, just vglrun glxgears in the container.

However, using the display #0 still doesn't help with the crashing vglrun glmark2, at least not by me. Using the debug option like vglrun glmark2 -d provides some output, but I still don't see the reason.

accetto commented 3 years ago

@accetto wrote on May 8, 2021:

I've played a little bit with the image accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus and I may have something for you.

Try this:

xhost +local:$(whoami)
docker run -it --rm -P \
  -v /tmp/.X11-unix:/tmp/.X11-unix:rw \
  --device /dev/dri/card0 \
  --device /dev/snd:/dev/snd:rw \
  --group-add audio \
accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus
xhost -local:$(whoami)

It seems to be promising, because glmark2 with vglrun already correctly detects my video hardware and we do not loose the VNC/noVNC access you wanted.

This is how it looks like if you start glmark2 inside the container without vglrun using just glmark2 command:

glmark2_no_vglrun

If you use vglrun glmark2, then it looks already better:

glmark2_with_vglrun

Notice that there is no -d option used with the vglrun. It means, that glmark2 still runs on the display :0, leaving the display :1 for the VNC/noVNC access and the vglrun passes the GLX commands through.

You can check it if you don't allow access to your local display :0 by leaving out the command xhost +local:$(whoami). Then you will get the following error:

glmark2_with_vglrun_no_xhost_permissions

Sharing the audio device is required for Blender. You can test it the same way: vglrun blender.

Please let me know if it worked.

accetto commented 3 years ago

@accetto wrote on May 10, 2021:

Hello @talent-less , does it already work?

accetto commented 3 years ago

@talent-less wrote on May 13, 2021:

@accetto Sorry for the late reply.

It doesn't seem to work on my side. The container exit immediately after running your command.

I think the local Ubuntu gnome session is doing something strange on the X domain socket file so the container crashed right after docker run. The container didn't put anything into stdout or stderr. Is there any way to get some logs?

However, I can start the container via SSH from my laptop when the ubuntu gnome session is logged out locally. But there I couldn't run xhost +local:$(whoami) as xhost said "couldn't find display". I can ignore that and start container, but it ended up like in your last screenshot. Do I need to somehow start the X server first?

accetto commented 3 years ago

@accetto wrote on May 13, 2021:

No, your X server is already running, if you are on Linux and if you already running a GUI desktop session.

Check it like this (on your host, not inside the container):

sudo ls /tmp/.X11*

### you should get something like this
X0

### check also the DISPLAY variable
echo $DISPLAY

### you should get something like this
:0.0

If xhost +local:$(whoami) doesn't work, then try xhost +, which grants access to everybody. Be sure that you run it on your host, not in the container. If it still doesn't work, then it could be a permission problem. Try to use sudo then.

accetto commented 3 years ago

@accetto wrote on May 13, 2021:

Maybe some more tips about what you can do during troubleshooting.

You should probably check the readme file for startup options help. You can also display it by starting a container with the option -h and also --help-usage. You will probably find the options --debug, --verbose, --tail-vnc, --skip-vnc and --skip startup most helpful.

You can bind the dockerstartup directory as an external volume and you'll get the logs directly there. You can make modifications in the startup scripts, e.g. by adding some debug reporting, and then start the container with the --skip-startup option and then execute the startup script from the second terminal. Something like this:

### start a container in the first terminal skipping the startup skript
docker run -it -P --name mytest <other-options> <image> --skip-startup bash

### connect from the second terminal ...
docker exec -it mytest bash

### ... and execute the startup script manually, possibly passing some startup options
headless@mytest:~$ /dockerstartup/./startup.sh --verbose

accetto commented 3 years ago

@talent-less wrote on May 13, 2021:

@accetto Thank you so much for the work and hints! I am able to get vglrun glmark2 and vglrun blender working on a NVIDIA GPU !

======================================================
    glmark2 2014.03+git20150611.fa71af2d
=======================================================
    OpenGL Information
    GL_VENDOR:     NVIDIA Corporation
    GL_RENDERER:   GeForce GTX 1050 Ti/PCIe/SSE2
    GL_VERSION:    4.6.0 NVIDIA 450.119.03
=======================================================

Here is the configuration for NVIDIA GPUs.

sudo docker run -it --rm --gpus all --name blender -p 6901:6901 -P   -v /tmp/.X11-unix/X1:/tmp/.X11-unix/X0:rw   --device /dev/dri/card0   --device /dev/snd:/dev/snd:rw  -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=graphics,utility,compute,video  --group-add audio accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus

I didn't configure libglvnd so I guess it takes the default graphics card from the X server on the host.

My default DISPLAY was on X1 and it crashed the container previously. -v /tmp/.X11-unix/X1:/tmp/.X11-unix/X0:rw fixs that.

So the last missing piece would be figuring out how to run the container in the cloud. @accetto Do you know how to run xserver and xhost from ssh on a headless Ubuntu Server instead of Ubuntu Desktop?

accetto commented 3 years ago

@accetto wrote on May 13, 2021:

I am able to get vglrun glmark2 and vglrun blender working on a NVIDIA GPU

That's a really great news! :+1:

Do you know how to run xserver and xhost from ssh on a headless Ubuntu Server instead of Ubuntu Desktop?

Not really, because I did not need it yet. Basically it should be similar to the desktop case, just without starting the Xfce4 stuff. Actually, the container is a headless server, so it should be possible to use it for testing. Check for example this article. You can always start a container skipping the startup script as described above and modify anything inside the container. The included nano editor does nor require GUI. However, ssh is not included, so you need to install it yourself.

You can save intermediate stages as helper images using docker commit.

You can also try PuTTY, which provides X11 forwarding.

accetto commented 3 years ago

@talent-less wrote on May 14, 2021:

I am a little bit confused now. Do I really need the X server on the host?

VirtualGL redirects an application's OpenGL/GLX commands to a separate X server (that has access to a 3D graphics card), captures the rendered images, and then streams them to the X server that actually handles the application.

On a headless sever, by default I have no X server talking to the GPU on the host. Can I somehow configure the X server in the container to use the GPU?

accetto commented 3 years ago

@accetto wrote on May 15, 2021:

Do I really need the X server on the host?

I would say so, but I'm actually also not 100% sure at the moment. I would recommend to start drawing some diagrams, to clarify what is running where and what is talking to what and how.

I think, that the use case we've just tested goes like this:

The Blender application is running inside the container and it draws on the display 1. The container has no GPU and it's actually a kind of a remote host. The Nvidia GPU is available only on the local host and it can work only with display 0, as I've undestood. So we need VirtualGL inside the container (= remote host) to pass the graphical commands and the rendered content between the displays 1 and 0. In other words, Blender thinks that it's drawing on the display 1 and the GPU thinks that it's drawing on the display 0. The VirtualGL makes them to believe that they are both right. :)

From that description I would derive, that both hosts (remote and local) need some kind of X server, because they both want to draw on a graphical display. There is no need for GPU on a non-graphical terminals.

Well, maybe I've misunderstood something, but that's actually my point. I would draw some diagrams for the use case you need.

BTW, I would like to move this conversation to the project accetto/headless-drawing-g3, because it actually belongs there and this Generation 1 repository is slowly approaching its retirement. Another alternative would be the GitHub Discussions by the sibling repository accetto/ubuntu-vnc-xfce-g3. What do you think?

ghost commented 3 years ago

Hi, I have been using virtualGL for a while but I the performance was still not ideal compared to running programs directly on the host. glmark2 suggest a 15% loss on my machine. Also , the configuration requires x server to be installed and configured (xhost) on the host , which might be a headache for some cloud platforms (especially container-only ones) that do not provide you access to the host system at all.

Luckily, I found https://github.com/ehfd/docker-nvidia-glx-desktop which uses glvnd instead of virtualGL and therefore doesn't require x server on the host.

@accetto Are you interested in continuing improving of the container?

accetto commented 3 years ago

Hi, I have been using virtualGL for a while but I the performance was still not ideal compared to running programs directly on the host. glmark2 suggest a 15% loss on my machine.

This is no surprise, because it's generally true for any kind of virtualization. However, I'm not sure, if you really need to use VirtualGL at all. I saw it always as an option. The example I've provided before was actually only a quick experiment, when I was testing if I can get my Intel GPU reported inside the container. Because I do not have NVidia, I'm not able to test the exact use case you're interested in, unfortunatelly. I also don't have any cloud subscription with GPU resources, so I'm not able to test such use cases either. I've hoped that you'll find out the best way to use it. :)

Also , the configuration requires x server to be installed and configured (xhost) on the host , which might be a headache for some cloud platforms

Actually I'm not sure if you need to share the X Server socket and to use xhost at all. I've provided such an example only because you've said that you want to test the GPU provided by your host. I've supposed that you'll not do the same on the cloud. This is why you wanted noVNC after all, isn't it?

At the most beginning you've provided this link. In that article the author describes two main scenarios - sharing the X11 socket and installing the X server stuff inside the the container. The author prefers the first scenario, but my containers actually implement the second one, which is also more suitable for the cloud, I would assume.

Have you already tried to use my container the same way as the one you've mentioned above?

I haven't tried or analyzed it yet, but on the first look there should be no difference. The author self describes it as

container supporting GLX/Vulkan for NVIDIA GPUs by spawning its own X Server and noVNC WebSocket interface instead of using the host X server.

and this actually exactly the same what my containers do. The glnvd is also available as part of the package libglvnd0, which is already included. Only the noVNC port is 6901 and I the NVidia drivers are not installed yet. But if only those are missing, then it would be no problem. Can you try to use my container exactly the same way? However, install the NVidia drivers first.

Are you interested in continuing improving of the container?

Sure I'm. I'll be glad to make the containers more useful also with applications that could benefit from using GPU, even if I don't have such use cases at the moment. My main use cases are more about encapsulated working environments for experimenting, testing and development. Because of not having an access to any NVidia GPU locally or on the cloud, I would welcome any help by testing such use cases, that I cannot do myself. Also I see Blender as a generally useful application, but I don't have any experience with it.

ghost commented 3 years ago

Thanks for your reply.

I tried apt install kmod curl installing nvidia driver using https://github.com/ehfd/docker-nvidia-glx-desktop/blob/cec9907cf2ad826aac53946e40bb9226fc4ea5b1/bootstrap.sh#L11 from bash but it failed with

ERROR: You appear to be running an X server; please exit X before installing. 
       For further details, please see the section INSTALLING THE NVIDIA DRIVER
       in the README available on the Linux driver download page at
       www.nvidia.com.

Then I tired it with docker run -it -P --name mytest <other-options> <image> --skip-startup bash. The driver installed successfully but I still can not get the GPU working. I also tired running a script before /dockerstartup/./startup.sh --verbose but it didn't help.

#!/bin/bash

set -e

if [ "$NVIDIA_VISIBLE_DEVICES" == "all" ]; then
  export GPU_SELECT=$(sudo nvidia-smi --query-gpu=uuid --format=csv | sed -n 2p)
elif [ -z "$NVIDIA_VISIBLE_DEVICES" ]; then
  export GPU_SELECT=$(sudo nvidia-smi --query-gpu=uuid --format=csv | sed -n 2p)
else
  export GPU_SELECT=$(sudo nvidia-smi --id=$(echo "$NVIDIA_VISIBLE_DEVICES" | cut -d ',' -f1) --query-gpu=uuid --format=csv | sed -n 2p)
  if [ -z "$GPU_SELECT" ]; then
    export GPU_SELECT=$(sudo nvidia-smi --query-gpu=uuid --format=csv | sed -n 2p)
  fi
fi

if [ -z "$GPU_SELECT" ]; then
  echo "No NVIDIA GPUs detected. Exiting."
  exit 1
fi

if ! sudo nvidia-smi --id="$GPU_SELECT" -q | grep -q "Tesla"; then
  DISPLAYSTRING="--use-display-device=None"
fi

HEX_ID=$(sudo nvidia-smi --query-gpu=pci.bus_id --id="$GPU_SELECT" --format=csv | sed -n 2p)
IFS=":." ARR_ID=($HEX_ID)
unset IFS
BUS_ID=PCI:$((16#${ARR_ID[1]})):$((16#${ARR_ID[2]})):$((16#${ARR_ID[3]}))
sudo nvidia-xconfig --virtual="1024x768" --depth="24" --mode="1024x768" --allow-empty-initial-configuration --no-use-edid-dpi --busid="$BUS_ID" --only-one-x-screen --no-xinerama "$DISPLAYSTRING"

if [ "x$SHARED" == "xTRUE" ]; then
  export SHARESTRING="-shared"
fi

shopt -s extglob
for TTY in $(ls -1 /dev/tty+([0-9]) | sort -rV); do
  if [ -w "$TTY" ]; then
    Xorg vt"$(echo "$TTY" | grep -Eo '[0-9]+$')" :0 &
    break
  fi
done
sleep 1

export DISPLAY=:0
UUID_CUT=$(sudo nvidia-smi --query-gpu=uuid --id="$GPU_SELECT" --format=csv | sed -n 2p | cut -c 5-)
if vulkaninfo | grep "$UUID_CUT" | grep -q ^; then
  VK=0
  while true; do
    if ENABLE_DEVICE_CHOOSER_LAYER=1 VULKAN_DEVICE_INDEX=$VK vulkaninfo | grep "$UUID_CUT" | grep -q ^; then
      export ENABLE_DEVICE_CHOOSER_LAYER=1
      export VULKAN_DEVICE_INDEX="$VK"
      break
    fi
    VK=$((VK + 1))
  done
else
  echo "Vulkan is not available for the current GPU."
fi

accetto commented 3 years ago

I assume that the answer from above wasn't really ment for me, was it? :)

ghost commented 3 years ago

Have you already tried to use my container the same way as the one you've mentioned above? I haven't tried or analyzed it yet, but on the first look there should be no difference.

I was trying to manually perform what docker-nvidia-glx-desktop does in your container, but I failed to get it to work.

I noticed that although docker-nvidia-glx-desktop installs the NVIDIA driver and glvnd explicitly, it also uses FROM nvidia/opengl:1.2-glvnd-devel-ubuntu20.04 instead of the official Ubuntu image. I wonder if there is some magic sauce in the nvidia image that makes x11vnc happy.

accetto commented 3 years ago

I see. :)

I didn't have time yet to test or analyze the other image, but I've actually ment to use the same or similar kind of docker run command line, without X11 socket sharing etc. I'm not sure if it would work right away, of course.

It can be, that there is something important inside the base image nvidia/opengl:1.2-glvnd-devel-ubuntu20.04. I'll try to look at it also. Maybe the best trick would be to use that image as the base. However, I would not be able to really test it at the moment.

As a tip, you can install stuff like NVidia drivers in the run-time and then export the running container as a new image and then use that image for the next experiment and so on. Something like this:

Start a new container docker run --name mycont ...

Install the NVidia drivers inside the container:

### update the apt cache first
sudo apt-get update

### install the NVidia drivers
sudo apt install nvidia-driver-460

Then export the running container as a new image (from outside the container):

docker commit mycont my/image:nvidia

Then you can create new containers from the image my/image:nvidia, install more stuff in the run-time, export important milestones as new images and so on.

It really doesn't help that I'm missing any NVidia hardware. I'm trying to find some quick and affordable solution, but no success yet.

accetto commented 3 years ago

I did some testing and I want to share the results.

I've used the glmark2 benchmark application and an image containing mesa-utilsand VirtualGL.

First I've run glmark2 directly on my host and I've got the following result:

=======================================================
    glmark2 2014.03+git20150611.fa71af2d
=======================================================
    OpenGL Information
    GL_VENDOR:     Intel
    GL_RENDERER:   Mesa Intel(R) UHD Graphics 620 (KBL GT2)
    GL_VERSION:    4.6 (Compatibility Profile) Mesa 20.2.6
=======================================================
...
=======================================================
                                  glmark2 Score: 3936 
=======================================================

Then I've run it inside the container, but using the display #0 of the host. The VNC must not be started in this case.

xhost +local:$(whoami)

docker run -it -P --rm \
    -e DISPLAY=${DISPLAY} \
    --device /dev/dri/card0 \
    -v /tmp/.X11-unix:/tmp/.X11-unix:rw \
    --name devrun \
    <image-with-mesa-and-virtualgl> --skip-vnc

xhost -local:$(whoami)

From the second terminal:

docker exec -it devrun glmark2

The result has been:

=======================================================
    glmark2 2014.03+git20150611.fa71af2d
=======================================================
    OpenGL Information
    GL_VENDOR:     Intel
    GL_RENDERER:   Mesa Intel(R) UHD Graphics 620 (KBL GT2)
    GL_VERSION:    4.6 (Compatibility Profile) Mesa 20.2.6
=======================================================
...
=======================================================
                                  glmark2 Score: 4074 
=======================================================

It's interesting that the result was even slightly better than in the first case, but it could be just a usual deviation.

Then I've tested it inside the container, not using the VisualGL first. Software rendering was used in this case. I've accessed the container with TigerVNC Viewer. The VNC was running, of course.

xhost +local:$(whoami)

docker run -it -P --rm \
    --device /dev/dri/card0 \
    -v /tmp/.X11-unix:/tmp/.X11-unix:rw \
    --name devrun \
    <image-with-mesa-and-virtualgl>

xhost -local:$(whoami)

The result has been significantly worse:

headless@devrun:~$ glmark2
** GLX does not support GLX_EXT_swap_control or GLX_MESA_swap_control!
** Failed to set swap interval. Results may be bounded above by refresh rate.
=======================================================
    glmark2 2014.03+git20150611.fa71af2d
=======================================================
    OpenGL Information
    GL_VENDOR:     Mesa/X.org
    GL_RENDERER:   llvmpipe (LLVM 11.0.0, 256 bits)
    GL_VERSION:    3.1 Mesa 20.2.6
=======================================================
...
=======================================================
                                  glmark2 Score: 378 
=======================================================

It's just about 10% of the first result directly on the host! I've got the score of just 306 by accessing the container via noVNC. That was another test run, of course. The slightly lower value is expectable, because there is an additional processing by websockify.

At last I've run the test inside the container using vglrun glmark2. The container has been started the same way as the one above. I've got the following result:

headless@devrun:~$ vglrun glmark2
=======================================================
    glmark2 2014.03+git20150611.fa71af2d
=======================================================
    OpenGL Information
    GL_VENDOR:     Intel
    GL_RENDERER:   Mesa Intel(R) UHD Graphics 620 (KBL GT2)
    GL_VERSION:    4.6 (Compatibility Profile) Mesa 20.2.6
=======================================================
...
=======================================================
                                  glmark2 Score: 549 
=======================================================

It can be seen, that Intel UHD Graphics 620 has been used for rendering and that the result is about 45% better then the previous one. So it seem that VirtaulGL is really working. By accessing the container via noVNC I've got the score of 447.

Note that I've got only about 60% of the scores while using an external 4K monitor.

I would make the following conclusions from these tests:

The best performance can be achieved by sharing the display of the host and not running the VNC server inside the container.
It still makes sense to install 3D applications inside containers. They can be nicely encapsulated and it seems that they could run without a performace penalty if sharing the display with the host.
It seems that the bottleneck is actually the VNC server running inside the container. According the results from above it costs about 90% of the performace. However, it can be different with NVidia hardware and NVIDIA Docker Toolkit installed.
If you still want/need to run the VNC server inside he container, then it's better to use the vglrun command from the VirtualGL toolkit, because the performance gain seems to be about 45%. It can be probably much more with NVidia hardware and drivers.
I would say, that the best approach for the cloud would be to try to come as close to the scenario (1) as possible.

accetto commented 3 years ago

@talent-less FYI

I've included the Mesa3D and VirtualGL support into the master branch and therefore I've removed the experimental branch exp-vgl. I've also re-published the images with Blender and they all include the Mesa3D libraries and the VirtualGL toolkit now. The glvnd is also included, of course. Please use those new images for experimenting.

I've also started a new discussion Supporting OpenGL/WebGL and using HW acceleration.

accetto / headless-drawing-g3

Supporting HW accelerated WebGL content or native apps (OpenGL) #1