Closed ghost closed 3 years ago
Thanks for the hint. I'll check it.
Hello @talent-less ,
I'm working on it and expect that I'll publish some experimental images soon. At the moment it seems like an image with VirtualGL
and Blender
would be sufficient for testing, would it?
However, I'll not be able to test it to the end, because I currently do not own any computer with Nvidia GPU. Would you agree to help with testing?
If yes, then what would be your prefered scenario and what Nvidia drivers should I pre-install for you in the test images?
Regards, accetto
@accetto Sure! Blender should be sufficient.
I have a Intel integrated-graphics and a NVIDIA GTX1050. Maybe I can also test it on other machines with more powerful graphics like 2080Ti. For the driver, I would try sudo apt install nvidia-driver-460
Thanks, that will help. We'll probably need a few iterations, because it's not exactly my playground, but hopefully we'll reach something. :)
Do you maybe know, what exactly the nvidia-headless-460 drivers do? Is it something exclusively for the cloud (maybe) or is it a kind of superset/subset to the nvidia-driver-460
drivers?
I guess nvidia-headless-460 is for Ubuntu without the X server. Maybe I can try both on aws. But I think that shouldn't be very problematic, as the driver is only required on the host machine.
For gl libs in the docker you can maybe refer to
https://github.com/nytimes/rd-blender-docker/blob/master/dist/2.80-gpu-ubuntu18.04/Dockerfile
https://gitlab.com/nvidia/container-images/opengl/-/blob/ubuntu20.04/glvnd/runtime/Dockerfile
Thanks for the hints. I've already a few alternatives developed that seem to work. However, it would help, if you could describe the first use case, you would like to test. How do you plan to use the container? Just locally or with a cloud? Do you plan to use it with VNC or do you want to share the X11 socket? Would you prefer TurboVNC or would be TigerVNC OK?
BTW: There will be an unexpected delay, because there is an ad-hoc problem with the TigerVNC download hosting (s. this issue). I have to configure a mirror and update all my containers, because it will be gone this weekend.
I will probably use it on a cloud platform as I don't have a NVIDIA GPU on my laptop. I think it will be very useful for people who want to perform long-running tasks like blender rendering while remaining a basic level of interactivity with the GUI instead of having the full thing running in headless mode.
It would be great if everything can be streamed via noVNC, just like in your firefox project. Can it be a layer on top of the firefox project? Then I can run both firefox and other OpenGL software using that image as starting point. If that is not possible, connecting with a VNC client via TCP should also be good.
I will not use the domain socket way, and I think it is already done by other containers
Thanks, let's start with this use case. I plan to prepare a few versions, with Firefox
and also with Chromium
. They will include noVNC
and also TigerVNC
, because they go together in my containers. Then I'll include Mesa3D utilities and VirtualGL
. If I've understood you correctly, I should not install NVidia drivers inside the containers.
BTW, I always forget to mention it. You've filed the issue by the image accetto/ubuntu-vnc-xfce-firefox, which is a Generation 1 image based on Ubuntu 18.04 LTS
. However, I plan to release Generation 3 images, that are based on Ubuntu 20.04 LTS
. They are generally slimmer and faster. I'll probably put it all into the project accetto/headless-drawing-g3. Would it be OK? Is there no show stopper for you?
Sounds great! accetto/headless-drawing-g3 should be a better place for software like blender :)
@talent-less , FYI
For the case that you've noticed that I've already started to publish Blender
images into the repository accetto/ubuntu-vnc-xfce-blender-g3 on Docker Hub. These are not yet the images we've discussed here.
Those Blender
images do not include any support for OpenGL/WebGl/VirtualGL
yet. You can use them later to compare the performance, for example.
However, because I have no experience with Blender
, I would appreciate, if you could test if something is missing there. Generally I try to keep the images as slim as possible, so I try to include only the really required packages at first.
But, for example, I've noticed that Blender
uses Python
and it seems, that it also installs it itself. However, it's still possible, that something is missing there.
Btw, you can always install additional packages, because sudo
is included. Do not forget to execute sudo apt-get update
first. The default sudo password is headless
.
I've just published the images with Blender
that include Mesa3D
utilities and VirtualGL
. Please go to the Docker Hub repository accetto/ubuntu-vnc-xfce-blender-g3 and look for the tags starting with 'vgl-'. Note that these images are not mentioned on the Readme page, because they are still experimental.
Unfortunatelly I cannot test it myself, because I currently do not have access to any Nvidia GPUs and I'm also missing experience with Blender
and VirtualGL
.
Few additional remarks still:
Mesa3D
utilities should support OpenGL
and WebGL
glxgears
(for OpenGL
) and es2gears
and es3tri
(for WebGL
) are also includedI've found that VirtualGL
can be used inside a container like:
vglrun -d :1 glxgears
vglrun -d :1 blender
I'm not sure if it's all what is needed, but it seems that the display argument is required. However, I hope that you already know what is needed.
The resources for the images are on GitHub in the experimental branch exp-vgl.
Please let me know if it is working and/or something is still to improve.
@accetto Hi, I tired sudo docker run --name blender -p 6901:6901 --gpus all --device=/dev/dri/card0:/dev/dri/card0 accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus
https://get.webgl.org/ says the browser supports WebGL, and blender can run with or without vglrun -d :1 blender
over novnc. However, I didn't feel any speed up with vglrun
in the blender GUI.
apt install glmark2
and vglrun -d :1 glmakr2
crashed with
=======================================================
glmark2 2014.03+git20150611.fa71af2d
=======================================================
OpenGL Information
GL_VENDOR: Mesa/X.org
GL_RENDERER: llvmpipe (LLVM 11.0.0, 256 bits)
GL_VERSION: 3.1 Mesa 20.2.6
=======================================================
[build] use-vbo=false:X Error of failed request: BadMatch (invalid parameter attributes)
Major opcode of failed request: 131 (MIT-SHM)
Minor opcode of failed request: 3 (X_ShmPutImage)
Serial number of failed request: 44
Current serial number in output stream: 45
glmakr2
without vglrun
works, but it seems to use software rendering instead of the GPU.
glxinfo | grep vendor
says
server glx vendor string: SGI
client glx vendor string: Mesa Project and SGI
OpenGL vendor string: Mesa/X.org
vglrun -d :1 glxinfo | grep vendor
says
server glx vendor string: VirtualGL
client glx vendor string: VirtualGL
OpenGL vendor string: Mesa/X.org
but glxinfo crashed.
Then I tried sudo docker run --name blender -p 6901:6901 accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus
without GPU, blender works fine, and the browser still says that I have WebGL.
apt install glmark2
and glamrk2
works .
=======================================================
glmark2 2014.03+git20150611.fa71af2d
=======================================================
OpenGL Information
GL_VENDOR: Mesa/X.org
GL_RENDERER: llvmpipe (LLVM 11.0.0, 256 bits)
GL_VERSION: 3.1 Mesa 20.2.6
=======================================================
It seems that there is a software OpenGL (SGL) somewhere trying to simulate the GPU.
I will try installing nvidia-driver-450-server
in the container (same as on the host machine) and see if it helps.
It's a bit strange that even before I install any driver in the container, nvidia-smi
is in the /usr/bin and it can detect the graphics card in the container without any problem. I assume the docker plugin mounts it for me. I also tried some Nvidia CUDA programs and they can run without problem.
It's just opengl that still falls back to the software rendering.
Do I need install X on the host and share it into the container with -v /tmp/.X11-unix/X0:/tmp/.X11-unix/X0:rw
like in https://medium.com/@benjamin.botto/opengl-and-cuda-applications-in-docker-af0eece000f1
I also tired sudo docker run --name blender --gpus all --device=/dev/dri/card0:/dev/dri/card0 -p 6901:6901 -v /tmp/.X11-unix/:/tmp/.X11-unix/:rw -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=graphics,utility,compute,video -d accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus
but the docker image exit(0) immediately without producing any logs.
Thanks for the feedback.
I'm still not quite sure about the configuration you use. Do you run the container on your lokal host or in a cloud? Do you try to use an NVidia card included in your host computer or do you try to use a GPU on a cloud? I've got an impression, that you've got a host with an Nvidia card and you're trying to share local devices with the container, is that right? And one more important thing - are you on Linux or on Windows?
I'm unable to repeat your scenarios, but this is what I've found when I've been experimenting on my Linux computer. Maybe it'll help.
The scenario is about running on local Linux computer and sharing X11 between the host computer and the container. A few remarks first:
:0
(not on :1
).--skip-vnc
. (However, if the VNC server is not running, then also the noVNC will not be unavailable.)Blender
requires also the sound device, if started with vglrun
.This is what has worked for me during testing. The best approach is to use two separate terminal windows.
Start a new container in the first terminal window:
# allow access to the local X server
xhost +local:$(whoami)
# start the container
docker run -it --rm \
-v /tmp/.X11-unix:/tmp/.X11-unix:rw \
-e DISPLAY=$DISPLAY \
--device /dev/snd:/dev/snd:rw \
--group-add audio \
--device /dev/dri/card0 \
--name devrun \
--hostname devrun \
accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc --skip-vnc
Then in the second terminal window connect to the running container and do what you need. For example:
# connect to the container
docker exec -it devrun bash
# install glmark2
sudo apt-get update
sudo apt-get install -y glmark2
# start glmark2 without vglrun
glmark2
# or with vglrun (should not crash this time)
vglrun glmark
# or blender
blender
# however, I've got crashes with blender with vglrun (maybe som more options are required)
vglrun blender
However, it can also be, that you simply need to install nvidia-docker on your host computer. Or maybe we're still missing something in the image.
Tomorrow I'll check the link you've send above.
Btw, sometimes it helps to start the containers with the --debug
option, your could get some hints about the problem.
I was testing on a clean Ubuntu 20.10 installation with latest docker and nvida-docker plugin on a physical machine at home without any virtualization.
I tired sharing the local X into the container with novnc enabled, I guess that's why it crashed. I don't think sharing local X with novnc disabled is what I would prefer, as I would like to share the container to a remote user.
@accetto were you able to get opengl working with novnc? I guess glmark should be showing the correct vendor (Intel, for example) instead of Mesa
I guess glmark should be showing the correct vendor (Intel, for example) instead of Mesa
I'm not quite sure about this. I'm afraid that the nvidia-docker
would recognize and work only with Nvidia hardware (see the Note to the reader in this article).
I was testing on a clean Ubuntu 20.10 installation with latest docker and nvida-docker plugin on a physical machine at home without any virtualization.
We would need at least one system with Nvidia GPU to be able to test it all. Do you have an Nvidia GPU in your host? Alternatively, do you have an access to a cloud GPU (as descibed in this article)?
I don't think sharing local X with novnc disabled is what I would prefer, as I would like to share the container to a remote user.
I think that it would help if you could draw a diagram describing the configuration you would like to use. Then we could concentrate on that particular scenario. If you do not have a diagramming software and don't want to use an online one, then you can use the container accetto/ubuntu-vnc-xfce-drawio-g3.
We would need at least one system with Nvidia GPU to be able to test it all.
Sure. The Ubuntu machine has a 1050Ti Nvidia GPU.
I also tried some Nvidia CUDA programs and they can run without problem.
And I think I have got nvidia-docker plugin working, because I can run GPU program (CUDA programs) in your docker image. These program doesn't use opengl related libraries though.
I also have access to a cloud with 1080Ti GPUs, but I have to pay for using it. So I would like to test the image first on the local ubuntu machine.
However I would test as if the local machine were in the cloud, i.e. no physical display connected and display output is sent via novnc to my laptop.
I'm afraid that the nvidia-docker would recognize and work only with Nvidia hardware (see the Note to the reader in this article).
Sure, I know.
@accetto were you able to get opengl working with novnc? I guess glmark should be showing the correct vendor (Intel, for example) instead of Mesa
I asked this because I suspect that novnc is overriding the graphics hardware (nvidia in my case and intel in your case ) with software opengl from mesa.
I also have access to a cloud with 1080Ti GPUs, but I have to pay for using it. So I would like to test the image first on the local ubuntu machine.
Exactly my view. :)
And I think I have got nvidia-docker plugin working, because I can run GPU program (CUDA programs) in your docker image. These program doesn't use opengl related libraries though.
That's a really good news! What has been needed to get it running? As I've already mentioned, OpenGL/Blender
is not exactly my playground, unfortunatelly. And not having any Nvidia hardware at the time makes me kind of one eye blind. :) However, I'll be glad to co-develop a useful image.
Do you have already some ideas what should/could we change in the image? I'll try to get some hints from the articles we've found, but it'll take some time.
Should I include some test apps into the images, like glmark2
, so you would not need to install them each time as new?
glmark2
should be a handy tool :)
I went through https://medium.com/@benjamin.botto/opengl-and-cuda-applications-in-docker-af0eece000f1 again, and it seems that some of libglvnd0 libgl1 libglx0 libegl1 libglvnd0 libgl1 libglx0 libegl1
are missing in the container.
According to https://gitlab.com/nvidia/container-images/opengl/-/blob/ubuntu20.04/glvnd/runtime/Dockerfile, both x86 and x64 version can be installed and libglvnd0 needs a json file
RUN apt-get update && apt-get install -y --no-install-recommends \
libglvnd0 libglvnd0:i386 \
libgl1 libgl1:i386 \
libglx0 libglx0:i386 \
libegl1 libegl1:i386 \
libgles2 libgles2:i386 && \
rm -rf /var/lib/apt/lists/*
COPY 10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
Here is how glvnd works: https://www.x.org/wiki/Events/XDC2016/Program/xdc-2016-glvnd-status.pdf
I am not sure if glvnd should be installed together with Mesa. Mesa seems to ship a software OpenGL and blender/firefox seems to use it instead of the real HW at the moment.
I am not sure if glvnd should be installed together with Mesa. Mesa seems to ship a software OpenGL and blender/firefox seems to use it instead of the real HW at the moment.
Oh that is out-dated.
In the process of migrating to Mesa 18.0, Canonical has updated their out-of-tree Mir patches for Mesa. They are also now enabling the OpenGL Vendor Neutral Dispatch Library "GLVND" that allows multiple OpenGL drivers to happily co-exist on the same system.
https://phoronix.com/scan.php?page=news_item&px=Ubuntu-18.04-Getting-Mesa-18.0
So maybe install libglvnd0 will make it work automagically.
Thanks, I'll check it. I'll also include glmark2
. Let me know, if also some other test apps or utilities would be useful. We can remove them later if the images will become too big.
BTW, should I continue building all the tags I've published? For example, I'm not sure if you use by testing the images with chromium
and the ones without noVNC
.
I will only use the ones with Firefox and novnc. Maybe you can disable other ones for now and reenable them when the opengl issue is resolved.
All the libraries you've listed above I've already included. You can check it by executing the following:
sudo apt-get update && \
apt search libglvnd0 ; \
apt search libgl1 ; \
apt search libglx0 ; \
apt search libegl1 ; \
apt search libgles2
Only the json file is not there because I didn't know about it. You can easily add it yourself and test it again. I'll also try to find out if something else could be missing.
Remark: Not all libraries are explicitely visible in the Dockerfile because many of them are installed as dependencies. In this case they are mostly part of the mesa
stuff.
I've just published an updated image tag for you and removed the other ones. I can build them any time if you'll need them for testing.
You can download the image by:
docker pull accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus
I've added the glmark2
test app and also the json
file you've pointed to. However, I've noticed, that there already has been the file 50_mesa.json
with exactly the same content. Unfortunatelly adding the file didn't solve the problem of crashing glmark2
if started with vglrun
.
The json
file points to the module libEGL_nvidia.so.0
, which is not installed in the image. I have tried to add it by installing the package libnvidia-gl-440
, but it didn't help with the crashing problem. I assume that nvidia-docker
and Nvidia hardware are still required.
One more tip for experimenting.
If you want to force the container's VNC to use the display #0, you can do it by overriding the VNC paramater in runtime, as it is described here. You can override also other VNC parameters and simplify you docker run
commands. Just be careful to bind a single file only, not a folder. The file content could look something like this, including the empty VNC password to avoid tipping it each time :)
export DISPLAY=:0
export VNC_PW=
# export DISPLAY=:2
# export VNC_COL_DEPTH=32
# export VNC_VIEW_ONLY=true
# export VNC_RESOLUTION=1024x768
# export VNC_PORT=5902
# export NO_VNC_PORT=6902
Then you can use, for example, just vglrun glxgears
in the container.
However, using the display #0 still doesn't help with the crashing vglrun glmark2
, at least not by me. Using the debug option like vglrun glmark2 -d
provides some output, but I still don't see the reason.
I've played a little bit with the image accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus and I may have something for you.
Try this:
xhost +local:$(whoami)
docker run -it --rm -P \
-v /tmp/.X11-unix:/tmp/.X11-unix:rw \
--device /dev/dri/card0 \
--device /dev/snd:/dev/snd:rw \
--group-add audio \
accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus
xhost -local:$(whoami)
It seems to be promising, because glmark2
with vglrun
already correctly detects my video hardware and we do not loose the VNC/noVNC access you wanted.
This is how it looks like if you start glmark2
inside the container without vglrun
using just glmark2
command:
If you use vglrun glmark2
, then it looks already better:
Notice that there is no -d
option used with the vglrun
. It means, that glmark2
still runs on the display :0, leaving the display :1 for the VNC/noVNC access and the vglrun
passes the GLX commands through.
You can check it if you don't allow access to your local display :0 by leaving out the command xhost +local:$(whoami)
. Then you will get the following error:
Sharing the audio device is required for Blender
. You can test it the same way: vglrun blender
.
Please let me know if it worked.
Hello @talent-less , does it already work?
@accetto Sorry for the late reply.
It doesn't seem to work on my side. The container exit immediately after running your command.
I think the local Ubuntu gnome session is doing something strange on the X domain socket file so the container crashed right after docker run. The container didn't put anything into stdout or stderr. Is there any way to get some logs?
However, I can start the container via SSH from my laptop when the ubuntu gnome session is logged out locally. But there I couldn't run xhost +local:$(whoami)
as xhost said "couldn't find display". I can ignore that and start container, but it ended up like in your last screenshot.
Do I need to somehow start the X server first?
No, your X server is already running, if you are on Linux and if you already running a GUI desktop session.
Check it like this (on your host, not inside the container):
sudo ls /tmp/.X11*
### you should get something like this
X0
### check also the DISPLAY variable
echo $DISPLAY
### you should get something like this
:0.0
If xhost +local:$(whoami)
doesn't work, then try xhost +
, which grants access to everybody. Be sure that you run it on your host, not in the container. If it still doesn't work, then it could be a permission problem. Try to use sudo
then.
Maybe some more tips about what you can do during troubleshooting.
You should probably check the readme file for startup options help. You can also display it by starting a container with the option -h
and also --help-usage
. You will probably find the options --debug
, --verbose
, --tail-vnc
, --skip-vnc
and --skip startup
most helpful.
You can bind the dockerstartup
directory as an external volume and you'll get the logs directly there. You can make modifications in the startup scripts, e.g. by adding some debug reporting, and then start the container with the --skip-startup
option and then execute the startup script from the second terminal. Something like this:
### start a container in the first terminal skipping the startup skript
docker run -it -P --name mytest <other-options> <image> --skip-startup bash
### connect from the second terminal ...
docker exec -it mytest bash
### ... and execute the startup script manually, possibly passing some startup options
headless@mytest:~$ /dockerstartup/./startup.sh --verbose
@accetto Thank you so much for the work and hints! I am able to get vglrun glmark2
and vglrun blender
working on a NVIDIA GPU !
======================================================
glmark2 2014.03+git20150611.fa71af2d
=======================================================
OpenGL Information
GL_VENDOR: NVIDIA Corporation
GL_RENDERER: GeForce GTX 1050 Ti/PCIe/SSE2
GL_VERSION: 4.6.0 NVIDIA 450.119.03
=======================================================
Here is the configuration for NVIDIA GPUs.
sudo docker run -it --rm --gpus all --name blender -p 6901:6901 -P -v /tmp/.X11-unix/X1:/tmp/.X11-unix/X0:rw --device /dev/dri/card0 --device /dev/snd:/dev/snd:rw -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=graphics,utility,compute,video --group-add audio accetto/ubuntu-vnc-xfce-blender-g3:vgl-vnc-novnc-firefox-plus
I didn't configure libglvnd so I guess it takes the default graphics card from the X server on the host.
My default DISPLAY was on X1 and it crashed the container previously. -v /tmp/.X11-unix/X1:/tmp/.X11-unix/X0:rw
fixs that.
So the last missing piece would be figuring out how to run the container in the cloud. @accetto Do you know how to run xserver and xhost from ssh on a headless Ubuntu Server instead of Ubuntu Desktop?
I am able to get vglrun glmark2 and vglrun blender working on a NVIDIA GPU
That's a really great news! :+1:
Do you know how to run xserver and xhost from ssh on a headless Ubuntu Server instead of Ubuntu Desktop?
Not really, because I did not need it yet. Basically it should be similar to the desktop case, just without starting the Xfce4
stuff. Actually, the container is a headless server, so it should be possible to use it for testing. Check for example this article. You can always start a container skipping the startup script as described above and modify anything inside the container. The included nano
editor does nor require GUI. However, ssh
is not included, so you need to install it yourself.
You can save intermediate stages as helper images using docker commit
.
You can also try PuTTY
, which provides X11 forwarding.
I am a little bit confused now. Do I really need the X server on the host?
VirtualGL redirects an application's OpenGL/GLX commands to a separate X server (that has access to a 3D graphics card), captures the rendered images, and then streams them to the X server that actually handles the application.
On a headless sever, by default I have no X server talking to the GPU on the host. Can I somehow configure the X server in the container to use the GPU?
Do I really need the X server on the host?
I would say so, but I'm actually also not 100% sure at the moment. I would recommend to start drawing some diagrams, to clarify what is running where and what is talking to what and how.
I think, that the use case we've just tested goes like this:
The Blender
application is running inside the container and it draws on the display 1. The container has no GPU and it's actually a kind of a remote host. The Nvidia GPU is available only on the local host and it can work only with display 0, as I've undestood. So we need VirtualGL
inside the container (= remote host) to pass the graphical commands and the rendered content between the displays 1 and 0. In other words, Blender
thinks that it's drawing on the display 1 and the GPU thinks that it's drawing on the display 0. The VirtualGL
makes them to believe that they are both right. :)
From that description I would derive, that both hosts (remote and local) need some kind of X server
, because they both want to draw on a graphical display. There is no need for GPU on a non-graphical terminals.
Well, maybe I've misunderstood something, but that's actually my point. I would draw some diagrams for the use case you need.
BTW, I would like to move this conversation to the project accetto/headless-drawing-g3, because it actually belongs there and this Generation 1 repository is slowly approaching its retirement. Another alternative would be the GitHub Discussions by the sibling repository accetto/ubuntu-vnc-xfce-g3. What do you think?
I've moved this complete conversation to the issue #1 in the Generation 3 repository accetto/headless-drawing-g3. Therefore I'm closing this issue.
@talent-less: Please continue at the new place.
The docker image works great, but it would be even better if it can run HW accelerated WebGL content or native apps (OpenGL) , like blender, etc.
FYI: https://medium.com/@pigiuz/hw-accelerated-gui-apps-on-docker-7fd424fe813e