Closed fyviezhao closed 2 years ago
OpenGL version 1.4 is roughly 20 years old, and it's indeed running an ATI driver for some reason. Do you have an AMD board in the machine? My guess is that you don't have any OpenGL drivers installed in the system and you're seeing some sort of system default driver. You may be missing libnvidia-gl package or something else required for OpenGL support on the NVIDIA board.
To enable a bit more debug output, you can try adding dr.set_log_level(0)
at the start of program. That shows if nvdiffrast is able to initialize the EGL context on the current Cuda device, and which OpenGL version it sees.
@s-laine Thanks for your quick reply! Yes, I also agree the container is running an ATI driver instead of the Nvidia driver. I googled it and found the reason may be direct rendering: No
. However, I don't find a good solution for turning this direct rendering
on. I have sudo previlage but cannot reboot the docker host.
Here is the output after setting dr.set_log_level(0)
:
[I glutil.cpp:322] Creating GL context for Cuda device 0
[I glutil.cpp:325] Failed, falling back to default display
[I glutil.cpp:370] EGL 5.1 OpenGL context created (disp: 0x0000000005cd9eb0, ctx: 0x000000009d6863d0)
[I rasterize.cpp:103] OpenGL version reported as 3.1
Traceback (most recent call last):
File "triangle.py", line 24, in <module>
glctx = dr.RasterizeGLContext()
File "/home/tiger/.local/lib/python3.7/site-packages/nvdiffrast/torch/ops.py", line 160, in __init__
self.cpp_wrapper = _get_plugin().RasterizeGLStateWrapper(output_db, mode == 'automatic', cuda_device_idx)
RuntimeError: OpenGL 4.4 or later is required
It is a bit weird, as shown in glxinfo
the OpenGL version seems to be 1.4 while in the log above it is reported as 3.1. Besides, the Ubuntu package libnvdia-gl
is unavailable on Debian so I instead search and install a similar Debian package libnvidia-glcore
, but it didn't work.
It seems that the problem could be solved by switching from ATI driver to Nvidia driver, wonder if you have any experience on this? Thanks for your help!
This certainly looks like a driver issue. It looks like EGL fails to enumerate the (virtual) displays, used for forcing the OpenGL context onto a specific Cuda GPU, and that is a bad sign too.
If you don't have an AMD board in the machine, maybe it is possible to remove the ATI driver altogether. But installing/uninstalling GPU drivers is something that most likely requires a reboot, as previously reported on Ubuntu.
Yes, now I need reach out to the cluster manager for solving this problems. Thanks!
How to solve this problem
Hi, I'm trying to set up nvdiffrast in a docker container running on a Debian host. Since I'm not familiar with docker-in-docker things, I try to follow the Dockfile provided in this repo to install the required apts & python packages. Finally I successfully installed
nvdiffraset-0.2.7
, but when running the test demotriangle.py
, it raisesRuntimeError: OpenGL 4.4 or later is required
: I'm not sure if the OpenGL version is bundled with the nvidia-driver version, which may cause the above error since the nvidia-driver version of the docker host looks a bit old. I have no permission to upgrade the host's nvidia driver, so can the OpenGL version be upgraded without changing the nvidia-driver? Any help is appreciated!Edited: More information from
glxinfo
: (looks like it usesAMD Radeon Pro 555X OpenGL Engine
instead of the host's NVIDIA-Tesla-V100 card, would this lead to any problem?)