running docker image fails

Trophime commented 2 years ago

Hi, trying to launch the docker image I run into this error:

/opt/paraview/bin/pvpython-real: error while loading shared libraries: libOpenGL.so.0: cannot open shared object file: No such file or directory

jourdain commented 2 years ago

Are you starting docker with the GPU option? Which version of ParaView are you using (default, EGL, OSMesa)?

The default one won't work inside docker.

Also does your system have a nvidia GPU with all the driver properly installed? If not, you will have to use the OSMesa version of ParaView.

Trophime commented 2 years ago

I've just naively tried with the scripts from docker/scripts: build_image.sh run_image.sh

jourdain commented 2 years ago

Then I believe your docker is not properly set to use your GPU especially the --gpus all. You may want to validate that you can run sudo docker run --gpus all --rm nvidia/cuda:9.0-base nvidia-smi which is mention here

Trophime commented 2 years ago

Status: Downloaded newer image for nvidia/cuda:9.0-base
Mon Feb 28 16:02:00 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01   Driver Version: 470.103.01   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro K2200        Off  | 00000000:01:00.0 Off |                  N/A |
| 42%   35C    P8     1W /  39W |    355MiB /  4043MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

It seems ok no?

jourdain commented 2 years ago

I seems ok, indeed... I'm wondering if we messed up the last push for that image. When I tried on my linux locally it worked, but since then the CI pushed new images automatically. I guess I'll try to clear my cache to test those to be sure.

Thanks for reporting the issue.

Trophime commented 2 years ago

Just let me know if it's working for you Maybe there is something broken on my Debian system.

jourdain commented 2 years ago

I confirm that I'm running into the same issue with the latest image as well. I guess we messed up something in our CI.

jourdain commented 2 years ago

Should be fixed. We used the runtime version from nvidia rather than devel and that cause us to not find the OpenGL library.

We are going to fix the CI to pick the right base image and we should be all-set.

Don't forget to docker pull kitware/trame:1.2-glvnd-runtime-ubuntu20.04-py39 first.

Trophime commented 2 years ago

Hi, now it almost starts. But still I have an error:

ParaView is using venv: /deploy/server/venv
Could not find domain of type: Boolean
(   2.600s) [paraview        ] vtkEGLRenderWindow.cxx:298   WARN| vtkEGLRenderWindow (0x151c9760): EGL device index: 0 is greater than the number of supported deviced in the system: 0. Using device 0 ...
(   2.600s) [paraview        ] vtkEGLRenderWindow.cxx:382    ERR| vtkEGLRenderWindow (0x151c9760): Only EGL 1.4 and greater allows OpenGL as client API. See eglBindAPI for more information.
(   2.600s) [paraview        ]vtkOpenGLRenderWindow.c:493    ERR| vtkEGLRenderWindow (0x151c9760): GLEW could not be initialized: Missing GL version
wslink: Starting factory

App running at:
 - Local:   http://0.0.0.0:9500/
 - Network: http://172.17.0.2:9500/

Note that for multi-users you need to use and configure a launcher.

(   4.464s) [paraview        ] vtkEGLRenderWindow.cxx:298   WARN| vtkEGLRenderWindow (0x15002450): EGL device index: 0 is greater than the number of supported deviced in the system: 0. Using device 0 ...
(   4.464s) [paraview        ] vtkEGLRenderWindow.cxx:382    ERR| vtkEGLRenderWindow (0x15002450): Only EGL 1.4 and greater allows OpenGL as client API. See eglBindAPI for more information.
(   4.464s) [paraview        ]vtkOpenGLRenderWindow.c:493    ERR| vtkEGLRenderWindow (0x15002450): GLEW could not be initialized: Missing GL version

Loguru caught a signal: SIGSEGV
Stack trace:
0       0x7fd23b86f210 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x46210) [0x7fd23b86f210]
(   4.464s) [paraview        ]                       :0     FATL| Signal: SIGSEGV
error: exception occurred: Segmentation fault

Trophime commented 2 years ago

@jourdain I'm afraid that the issue shall be re-opened or It related to my config. Could you confirm that it is working out of the box, please?

jourdain commented 2 years ago

It is working on my box. I'm wondering if it is a driver version or else. When I'm back to the office, I'll post the nvidia-smi using the base image along with the output log of a starting visualizer process.

jourdain commented 2 years ago

Host

nvidia-smi 
Tue Mar  1 08:45:17 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80       Driver Version: 460.80       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:0B:00.0 Off |                  N/A |
| 18%   26C    P8    21W / 250W |     67MiB / 11011MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1190      G   /usr/lib/xorg/Xorg                 56MiB |
|    0   N/A  N/A      1406      G   /usr/bin/gnome-shell                8MiB |
+-----------------------------------------------------------------------------+

Within Docker

docker run --gpus all --rm --entrypoint nvidia-smi kitware/trame:1.2-glvnd-runtime-ubuntu20.04-py39 
Tue Mar  1 15:46:30 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80       Driver Version: 460.80       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:0B:00.0 Off |                  N/A |
| 18%   26C    P8    21W / 250W |     67MiB / 11011MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Log from a running session

ParaView is using venv: /deploy/server/venv
Could not find domain of type: Boolean
wslink: Starting factory

App running at:
 - Local:   http://0.0.0.0:9500/
 - Network: http://172.17.0.2:9500/

Note that for multi-users you need to use and configure a launcher.

[...] some debug output [...]

jourdain commented 2 years ago

It seems that my driver is older than yours. I'm not sure what's the deal here.

Trophime commented 2 years ago

I got it work on a debian stable host (bullseye) (nvidia: 460.91.03)but still have the issue on a debian testing host (bookworm) (nvidia 470.103.01). I shall check if it's working on the forthcoming LTS ubuntu version. The trouble seems to be connected with the version of nvidia driver.

jourdain commented 2 years ago

Thanks for reporting back. I guess if you feel adventurous, you could build the trame images [1, 2] using a different base image from nvidia to see if that fix the driver compatibility issue.

Kitware / paraview-visualizer

running docker image fails #2