NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.37k stars 146 forks source link

OpenGL 4.4 or later is required #53

Closed fyviezhao closed 2 years ago

fyviezhao commented 2 years ago

Hi, I'm trying to set up nvdiffrast in a docker container running on a Debian host. Since I'm not familiar with docker-in-docker things, I try to follow the Dockfile provided in this repo to install the required apts & python packages. Finally I successfully installed nvdiffraset-0.2.7, but when running the test demo triangle.py, it raises RuntimeError: OpenGL 4.4 or later is required: image I'm not sure if the OpenGL version is bundled with the nvidia-driver version, which may cause the above error since the nvidia-driver version of the docker host looks a bit old. I have no permission to upgrade the host's nvidia driver, so can the OpenGL version be upgraded without changing the nvidia-driver? Any help is appreciated!

Edited: More information from glxinfo: (looks like it uses AMD Radeon Pro 555X OpenGL Engine instead of the host's NVIDIA-Tesla-V100 card, would this lead to any problem?) image

s-laine commented 2 years ago

OpenGL version 1.4 is roughly 20 years old, and it's indeed running an ATI driver for some reason. Do you have an AMD board in the machine? My guess is that you don't have any OpenGL drivers installed in the system and you're seeing some sort of system default driver. You may be missing libnvidia-gl package or something else required for OpenGL support on the NVIDIA board.

To enable a bit more debug output, you can try adding dr.set_log_level(0) at the start of program. That shows if nvdiffrast is able to initialize the EGL context on the current Cuda device, and which OpenGL version it sees.

fyviezhao commented 2 years ago

@s-laine Thanks for your quick reply! Yes, I also agree the container is running an ATI driver instead of the Nvidia driver. I googled it and found the reason may be direct rendering: No. However, I don't find a good solution for turning this direct rendering on. I have sudo previlage but cannot reboot the docker host.

Here is the output after setting dr.set_log_level(0):

[I glutil.cpp:322] Creating GL context for Cuda device 0
[I glutil.cpp:325] Failed, falling back to default display
[I glutil.cpp:370] EGL 5.1 OpenGL context created (disp: 0x0000000005cd9eb0, ctx: 0x000000009d6863d0)
[I rasterize.cpp:103] OpenGL version reported as 3.1
Traceback (most recent call last):
  File "triangle.py", line 24, in <module>
    glctx = dr.RasterizeGLContext()
  File "/home/tiger/.local/lib/python3.7/site-packages/nvdiffrast/torch/ops.py", line 160, in __init__
    self.cpp_wrapper = _get_plugin().RasterizeGLStateWrapper(output_db, mode == 'automatic', cuda_device_idx)
RuntimeError: OpenGL 4.4 or later is required

It is a bit weird, as shown in glxinfo the OpenGL version seems to be 1.4 while in the log above it is reported as 3.1. Besides, the Ubuntu package libnvdia-gl is unavailable on Debian so I instead search and install a similar Debian package libnvidia-glcore, but it didn't work.

It seems that the problem could be solved by switching from ATI driver to Nvidia driver, wonder if you have any experience on this? Thanks for your help!

s-laine commented 2 years ago

This certainly looks like a driver issue. It looks like EGL fails to enumerate the (virtual) displays, used for forcing the OpenGL context onto a specific Cuda GPU, and that is a bad sign too.

If you don't have an AMD board in the machine, maybe it is possible to remove the ATI driver altogether. But installing/uninstalling GPU drivers is something that most likely requires a reboot, as previously reported on Ubuntu.

fyviezhao commented 2 years ago

Yes, now I need reach out to the cluster manager for solving this problems. Thanks!

pfeducode commented 1 year ago

How to solve this problem