NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.35k stars 144 forks source link

glewInit() failed #18

Closed cosw0t closed 3 years ago

cosw0t commented 3 years ago

I'm on Ubuntu-18.04 and I've installed all dependencies as in the docker file including

I'm using torch 1.7.1, cuda 10.2

I keep getting this error on any sample in the torch folder:

[F glutil.inl:188] glewInit() failed, return value = 4

Any idea why?

nurpax commented 3 years ago
  1. Can you get it working if you run inside docker with the provided sample code?
  2. What is the output of nvidia-smi on your system?

It's tricky to get the installation exactly right on the Linux host. I've always run this stuff only in Docker on Linux.

cosw0t commented 3 years ago

1 - yes I can. However I would also like to run on native linux if possible, for development and testing etc. Also in case I need to setup another docker image I would like to know how I need to set it up. 2 - nvidia-smi reports:

Wed Feb 24 10:40:00 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  TITAN RTX           Off  | 00000000:01:00.0 Off |                  N/A |
| 41%   34C    P8    24W / 280W |      3MiB / 24219MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
cosw0t commented 3 years ago

error 4 seems to be GLEW_ERROR_NO_GLX_DISPLAY

could it be related to this? https://github.com/nigels-com/glew/issues/172

nurpax commented 3 years ago

Also in case I need to setup another docker image I would like to know how I need to set it up.

I don't understand the question. You can build an image with our Dockerfile with "docker build -t your-base-image:latest ." and if you need to setup your own image, you start your own Dockerfile with "FROM your-base-image:latest" and build your own. Or alternatively you edit our Dockerfile.

There's a number of ways the GL/EGL init can fail. If you want to debug this further, I'd recommend adding debug prints into the GLEW source code (you can the Dockerfile on how it's built) or in our EGL init code in nvdiffrast.

cosw0t commented 3 years ago

Mea culpa, it must be that i have compiled glew without SYSTEM-linux-egl - it slipped in the install command. So it does work fine afterall.

Another way it works is with this change:

diff --git a/nvdiffrast/common/glutil.inl b/nvdiffrast/common/glutil.inl
index 4df00a5..1ef9c59 100644
--- a/nvdiffrast/common/glutil.inl
+++ b/nvdiffrast/common/glutil.inl
@@ -184,7 +184,7 @@ static void setGLContext(GLContext& glctx)
         return;

     GLenum result = glewInit();
-    if (result != GLEW_OK)
+    if (result != GLEW_OK && result != GLEW_ERROR_NO_GLX_DISPLAY)
         LOG(FATAL) << "glewInit() failed, return value = " << result;
     glctx.glewInitialized = 1;
 }

If you are open to suggestions, the latter method would allow users to stick to using glew installed from apt - but that's just personal preference.

And, while I have your attention, I would suggest this other change in the Dockerfile:

diff --git a/docker/Dockerfile b/docker/Dockerfile
index 5b35a93..cc4de98 100644
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -50,7 +50,7 @@ RUN mkdir -p /tmp && \
     cd /tmp && tar zxf /tmp/glew-2.1.0.tgz && cd glew-2.1.0 && \
     SYSTEM=linux-egl make && \
     SYSTEM=linux-egl make install && \
-    rm -rf /tmp/glew-2.1.0.zip /tmp/glew-2.1.0
+    rm -rf /tmp/glew-2.1.0.tgz /tmp/glew-2.1.0

 RUN pip install imageio imageio-ffmpeg

and this for the run_sample.sh file

diff --git a/run_sample.sh b/run_sample.sh
old mode 100644
new mode 100755

Thanks, and just wanted to say: this library is really amazing!

nurpax commented 3 years ago

Thanks @michele-arrival for reporting your findings -- it's very useful!

Re: GLEW - I guess an alternative would be to vendor in GLEW into the library and compile the required parts during compilation of nvdiffrast. At least personally, using system libraries often leads to new types of problems on Linux, so vendoring might be a more stable solution.

-    rm -rf /tmp/glew-2.1.0.zip /tmp/glew-2.1.0
+    rm -rf /tmp/glew-2.1.0.tgz /tmp/glew-2.1.0

Whoops, indeed! Thanks for letting me know. I'll put these suggestions on my todo list.

hiyyg commented 3 years ago

What's wrong with sudo apt-get install libglew-dev, it works for me.

nurpax commented 3 years ago

FYI - as of https://github.com/NVlabs/nvdiffrast/commit/a4e7a4db7e09695b4efc7641cc6b044ef706f953, GLEW init should not be an issue anymore. The GLEW dependency was removed from nvdiffrast.

cosw0t commented 3 years ago

Thank you very much! that will certainly make my (an many other's) life easier!