Closed Mirocos closed 2 years ago
looks like version of driver does not match?
dpkg -l | grep -i opengl
ii libepoxy0:amd64 1.4.3-1 amd64 OpenGL function pointer management library
ii libgl1-mesa-dev:amd64 20.0.8-0ubuntu1~18.04.1 amd64 free implementation of the OpenGL API -- GLX development files
ii libgl1-mesa-dri:amd64 20.0.8-0ubuntu1~18.04.1 amd64 free implementation of the OpenGL API -- DRI modules
ii libgl1-mesa-dri:i386 20.0.8-0ubuntu1~18.04.1 i386 free implementation of the OpenGL API -- DRI modules
ii libgles2-mesa-dev:amd64 20.0.8-0ubuntu1~18.04.1 amd64 free implementation of the OpenGL|ES 2.x API -- development files
ii libglfw3:amd64 3.2.1-1 amd64 portable library for OpenGL, window and input (x11 libraries)
ii libglfw3-dev:amd64 3.2.1-1 amd64 portable library for OpenGL, window and input (development files)
ii libglu1-mesa:amd64 9.0.0-2.1build1 amd64 Mesa OpenGL utility library (GLU)
ii libglu1-mesa-dev:amd64 9.0.0-2.1build1 amd64 Mesa OpenGL utility library -- development files
ii libglx-mesa0:amd64 20.0.8-0ubuntu1~18.04.1 amd64 free implementation of the OpenGL API -- GLX vendor library
ii libglx-mesa0:i386 20.0.8-0ubuntu1~18.04.1 i386 free implementation of the OpenGL API -- GLX vendor library
ii libnvidia-cfg1-470:amd64 470.63.01-0ubuntu0.18.04.2 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-fbc1-470:amd64 470.63.01-0ubuntu0.18.04.2 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-fbc1-470:i386 470.63.01-0ubuntu0.18.04.2 i386 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-gl-470:amd64 470.63.01-0ubuntu0.18.04.2 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii libnvidia-gl-470:i386 470.63.01-0ubuntu0.18.04.2 i386 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii libnvidia-ifr1-470:amd64 470.63.01-0ubuntu0.18.04.2 amd64 NVIDIA OpenGL-based Inband Frame Readback runtime library
ii libnvidia-ifr1-470:i386 470.63.01-0ubuntu0.18.04.2 i386 NVIDIA OpenGL-based Inband Frame Readback runtime library
ii libopengl0:amd64 1.0.0-2ubuntu2.3 amd64 Vendor neutral GL dispatch library -- OpenGL support
ii libqt4-opengl:amd64 4:4.8.7+dfsg-7ubuntu1 amd64 Qt 4 OpenGL module
ii libqt5opengl5:amd64 5.9.5+dfsg-0ubuntu2.6 amd64 Qt 5 OpenGL module
ii libqt5opengl5-dev:amd64 5.9.5+dfsg-0ubuntu2.6 amd64 Qt 5 OpenGL library development files
nvidia-smi
Thu Sep 23 12:00:02 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:05:00.0 On | N/A |
| 0% 49C P8 13W / 125W | 368MiB / 5941MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1817 G /usr/lib/xorg/Xorg 18MiB |
| 0 N/A N/A 1947 G /usr/bin/gnome-shell 68MiB |
| 0 N/A N/A 2929 G /usr/lib/xorg/Xorg 150MiB |
| 0 N/A N/A 3053 G /usr/bin/gnome-shell 38MiB |
| 0 N/A N/A 3748 G ...AAAAAAAAA= --shared-files 87MiB |
+-----------------------------------------------------------------------------+
Setting up EGL is somewhat tricky, so we recommend trying the provided Docker container first and seeing if that works. If you then want to run with a local installation, you can use the container as a reference. Or was this crash with Docker?
I don't know what might explain the difference in reported and installed driver version. Perhaps the system hasn't been rebooted since the update was performed?
Now I fixed the problem of version mismatch. But still failed with same error.
However, sample command succeed without the option "--display-interval".
The reason that I didn't use docker is the process crashed with follow message:
./run_sample.sh ./samples/torch/cube.py --resolution 32
Using container image: gltorch:latest
Running command: ./samples/torch/cube.py --resolution 32
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
this is the message when build container:
./run_sample.sh --build-container
Sending build context to Docker daemon 56.73MB
Step 1/14 : ARG BASE_IMAGE=pytorch/pytorch:1.7.1-cuda11.0-cudnn8-devel
Step 2/14 : FROM $BASE_IMAGE
---> 7554ac65eba5
Step 3/14 : RUN apt-get update && apt-get install -y --no-install-recommends pkg-config libglvnd0 libgl1 libglx0 libegl1 libgles2 libglvnd-dev libgl1-mesa-dev libegl1-mesa-dev libgles2-mesa-dev cmake curl
---> Using cache
---> e4a09e440d68
Step 4/14 : ENV PYTHONDONTWRITEBYTECODE=1
---> Using cache
---> 631c224d81ed
Step 5/14 : ENV PYTHONUNBUFFERED=1
---> Using cache
---> 1ba7bd67688d
Step 6/14 : ENV LD_LIBRARY_PATH /usr/lib64:$LD_LIBRARY_PATH
---> Using cache
---> f952c0b196b8
Step 7/14 : ENV NVIDIA_VISIBLE_DEVICES all
---> Using cache
---> f12db9f73bc8
Step 8/14 : ENV NVIDIA_DRIVER_CAPABILITIES compute,utility,graphics
---> Using cache
---> aa3eff00fe58
Step 9/14 : ENV PYOPENGL_PLATFORM egl
---> Using cache
---> 19e5de1e15e0
Step 10/14 : COPY docker/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
---> Using cache
---> 376259c0df7d
Step 11/14 : RUN pip install imageio imageio-ffmpeg
---> Using cache
---> dc0e3348ca79
Step 12/14 : COPY nvdiffrast /tmp/pip/nvdiffrast/
---> Using cache
---> 2d1212d00ab5
Step 13/14 : COPY README.md setup.py /tmp/pip/
---> Using cache
---> 4146b33d2378
Step 14/14 : RUN cd /tmp/pip && pip install .
---> Using cache
---> 69a6c149e11c
Successfully built 69a6c149e11c
Successfully tagged gltorch:latest
Sending build context to Docker daemon 56.73MB
Step 1/14 : ARG BASE_IMAGE=pytorch/pytorch:1.7.1-cuda11.0-cudnn8-devel
Step 2/14 : FROM $BASE_IMAGE
---> e544497892a3
Step 3/14 : RUN apt-get update && apt-get install -y --no-install-recommends pkg-config libglvnd0 libgl1 libglx0 libegl1 libgles2 libglvnd-dev libgl1-mesa-dev libegl1-mesa-dev libgles2-mesa-dev cmake curl
---> Using cache
---> 12a6fec8eaca
Step 4/14 : ENV PYTHONDONTWRITEBYTECODE=1
---> Using cache
---> 4ce4de9c99d6
Step 5/14 : ENV PYTHONUNBUFFERED=1
---> Using cache
---> 3ad3686f14a5
Step 6/14 : ENV LD_LIBRARY_PATH /usr/lib64:$LD_LIBRARY_PATH
---> Using cache
---> 2a3647186c1b
Step 7/14 : ENV NVIDIA_VISIBLE_DEVICES all
---> Using cache
---> 75f2e984a64a
Step 8/14 : ENV NVIDIA_DRIVER_CAPABILITIES compute,utility,graphics
---> Using cache
---> 73fb77c464e0
Step 9/14 : ENV PYOPENGL_PLATFORM egl
---> Using cache
---> 12542217941c
Step 10/14 : COPY docker/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
---> Using cache
---> d3ee66f5a806
Step 11/14 : RUN pip install imageio imageio-ffmpeg
---> Using cache
---> 10e1baa2bf04
Step 12/14 : COPY nvdiffrast /tmp/pip/nvdiffrast/
---> Using cache
---> 3c6c52c781ac
Step 13/14 : COPY README.md setup.py /tmp/pip/
---> Using cache
---> e4ec44f0c452
Step 14/14 : RUN cd /tmp/pip && pip install .
---> Using cache
---> 3d7f37505e25
Successfully built 3d7f37505e25
Successfully tagged gltensorflow:latest
No python sample given or file '' not found. Exiting.
Thank you for the clarification. The interactive display on Linux is apparently very problematic, and I'm not sure if anyone has found a setup where it works so far. As of now, disabling the interactive display seems to be the only possibility.
I'll add a note about this in the documentation in the next update.
so does the docker container build well? and how to solve the message like: "docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]."
Can you run anything else in Docker that uses GPU? Like some basic pytorch samples using official PyTorch containers? This would be a useful test to check that your Docker/drivers are working. And make sure you have the nvidia runtime correctly installed for Docker.
One thing that sometimes happens to me personally, is that under Ubuntu, the graphics drivers get updated under the hood (without anyone asking or me approving such an update) and that anything running CUDA or graphics fails to run until I reboot my Ubuntu box. (The problem being mismatched drivers and Linux kernel.) Not saying this will fix your problem but it may be worth trying out.
Other than, hard to say what could be going wrong.
Hi, I found such a message can be solved by following setup:
Setup the stable repository and the GPG key of Nvidia Container Toolkit:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update
sudo apt -y install nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
then run the example in docker succeed on my machine
Sounds like the problem was solved, closing.
Hello, I encountered the same error here. Have you finally solved it?
Did you run this in linux environment? If so, the --display-interval
option seems to not work in such condition.
I have tried so many times to solve this problem, so I read the project source code, together with some references, I got my own conclusion:
The Context passed to EGL or GLFW can not be shared in linux environment, which makes it crash when use --display-interval
to open a window for real-time visualization. Unless you re-coding the source code and change EGL context to GLFW context, I guess
But it still works for its main functions or purposes, you can dump the tensors to local image if you need to visualize something.
Or, why not consider to use NvDiffRast in windows environment?
------------------ 原始邮件 ------------------ 发件人: "NVlabs/nvdiffrast" @.>; 发送时间: 2022年7月25日(星期一) 晚上6:51 @.>; @.**@.>; 主题: Re: [NVlabs/nvdiffrast] [E glutil.cpp:248] eglMakeCurrent() failed when setting GL context (#45)
您好,我在本地遇到了一样的错误,请问最后解决了吗?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Thank you, I can run normally in Linux without "--display interval". Specify outdir to view the results
Hi, I follow the document and use nvdiffrast in ubuntu18.04LTS with cuda11.4. I executed the following command which throws an exception.
command:
exception
Any advice ? Thanks !!!