NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.43k stars 158 forks source link

I have problem about nvdiffrast_plugin_gl.so #92

Closed JiyouSeo closed 2 years ago

JiyouSeo commented 2 years ago

Hello, thank you for your great research!

I have a problem running the code that import nvdiffrast.

Traceback (most recent call last):
  File "/home/jiyouseo/anaconda3/envs/dmodel/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1809, in _run_ninja_build
    subprocess.run(
  File "/home/jiyouseo/anaconda3/envs/dmodel/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jiyouseo/nvdiffrec/train.py", line 556, in <module>
    glctx = dr.RasterizeGLContext()
  File "/home/jiyouseo/anaconda3/envs/dmodel/lib/python3.9/site-packages/nvdiffrast/torch/ops.py", line 221, in __init__
    self.cpp_wrapper = _get_plugin(gl=True).RasterizeGLStateWrapper(output_db, mode == 'automatic', cuda_device_idx)
  File "/home/jiyouseo/anaconda3/envs/dmodel/lib/python3.9/site-packages/nvdiffrast/torch/ops.py", line 118, in _get_plugin
    torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=opts, extra_cuda_cflags=opts+['-lineinfo'], extra_ldflags=ldflags, with_cuda=True, verbose=False)
  File "/home/jiyouseo/anaconda3/envs/dmodel/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1202, in load
    return _jit_compile(
  File "/home/jiyouseo/anaconda3/envs/dmodel/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/jiyouseo/anaconda3/envs/dmodel/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1537, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/jiyouseo/anaconda3/envs/dmodel/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1825, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'nvdiffrast_plugin_gl': [1/1] c++ common.o glutil.o rasterize_gl.o torch_bindings_gl.o torch_rasterize_gl.o -shared -lGL -lEGL -L/home/jiyouseo/anaconda3/envs/dmodel/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/usr/local/cuda-11.3/lib64 -lcudart -o nvdiffrast_plugin_gl.so
FAILED: nvdiffrast_plugin_gl.so 
c++ common.o glutil.o rasterize_gl.o torch_bindings_gl.o torch_rasterize_gl.o -shared -lGL -lEGL -L/home/jiyouseo/anaconda3/envs/dmodel/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/usr/local/cuda-11.3/lib64 -lcudart -o nvdiffrast_plugin_gl.so
/usr/bin/ld: cannot find -lGL
/usr/bin/ld: cannot find -lEGL
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

So, I modify ['ninja', '-v'] to ['ninja', '--version'] in cpp_extension.py Then It return

ImportError:/home/jiyouseo/.cache/torch_extensions/py36_cu113/nvdiffrast_plugin_gl/nvdiffrast_plugin_gl.so: cannot open shared object file: No such file or directory

I guess it is because it cannot build nvdiffrast_plugin_gl.so. How can I build exactly nvdiffrast_plugin_gl.so? And this is my code environment.

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.6 LTS
Release:    18.04
Codename:   bionic
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A6000    On   | 00000000:3B:00.0 Off |                  Off |
| 58%   82C    P2   232W / 300W |  34285MiB / 48685MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A6000    On   | 00000000:5E:00.0 Off |                  Off |
| 67%   85C    P2   234W / 300W |  33903MiB / 48685MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX A6000    On   | 00000000:AF:00.0 Off |                  Off |
| 67%   85C    P2   235W / 300W |  34135MiB / 48685MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX A6000    On   | 00000000:D8:00.0 Off |                  Off |
| 61%   84C    P2   232W / 300W |  34101MiB / 48685MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1733      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A     16354      C   ...onda/envs/uvtr/bin/python    34277MiB |
|    1   N/A  N/A      1733      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A     16355      C   ...onda/envs/uvtr/bin/python    33895MiB |
|    2   N/A  N/A      1733      G   /usr/lib/xorg/Xorg                  4MiB |
|    2   N/A  N/A     16356      C   ...onda/envs/uvtr/bin/python    34127MiB |
|    3   N/A  N/A      1733      G   /usr/lib/xorg/Xorg                  4MiB |
|    3   N/A  N/A     16357      C   ...onda/envs/uvtr/bin/python    34093MiB |
+-----------------------------------------------------------------------------+

Thank you.

s-laine commented 2 years ago

You seem to have no graphics libraries installed on your system (OpenGL and EGL) that are required by the OpenGL-based rasterizer. I suggest you take a look at our Dockerfile to see the required libraries and environment variables, or even better, use the Docker environment directly. Note that you will also need the OS-level graphics drivers installed in the system.

Since version v0.3.0 nvdiffrast includes a Cuda-based rasterizer that doesn't require the graphics libraries or drivers. It has some restrictions compared to OpenGL (see documentation) but it could be a working solution in your use case.

JiyouSeo commented 2 years ago

Thank you for kindly replying. I will do following your reply.

L-Aidan commented 2 years ago

Hello, I have the same problem as you, have you solved it?

L-Aidan commented 2 years ago

I have replaced "RasterizeGLContext" with "RasterizeCudaContext", so I guess I don't need to install graphics libraries like OpenGL,but the problem is still there.

harlanhong commented 2 years ago

I have replaced "RasterizeGLContext" with "RasterizeCudaContext", so I guess I don't need to install graphics libraries like OpenGL,but the problem is still there.

Which line did you modify? I met the same problem.

L-Aidan commented 2 years ago

I did the same thing as JiyouSeo, I modified ['ninja', '-v'] to ['ninja', '--version'] in cpp_extension.py ,and then I got "nvdiffrast_plugin_gl.so: cannot open shared object file: No such file or directory".

L-Aidan commented 2 years ago

It seems that I have solved this problem:don't modify " ['ninja', '-v']", and replace "RasterizeGLContext" with "RasterizeCudaContext" in your project.

harlanhong commented 2 years ago

It seems that I have solved this problem:don't modify " ['ninja', '-v']", and replace "RasterizeGLContext" with "RasterizeCudaContext" in your project.

in util/nvdiffrast.py? it does not work for me.

L-Aidan commented 2 years ago

in util/nvdiffrast.py? it does not work for me.

It should be in some files which import nvdiffrast, I think you will find it by searching "RasterizeGLContext" in your own project. By the way, if you want to use "RasterizeGLContext", you must install some related libs following the DockerFile

RyanbowZ commented 7 months ago

Here's a new update on this issue: If you already installed ninja and tried all the above-mentioned methods but still suffer from this problem, you can check whether you have installed ninja both on python and ubuntu environment. Installing both of them will cause conflict! I tried almost every other approach on the Internet, until finally I resolved this by: pip uninstall ninja And I suggest not change anything under system directories, the suggested methods of modifying ['ninja', '-v'] to ['ninja', '--version'] in cpp_extension.py might be just a way of shifting between the ubuntu and python's versions of ninja.

iliagrigorevdev commented 4 months ago

For me it was missing EGL and GL headers in conda c++ compiler.

Installing these two packages helped: conda install conda-forge::mesa-libegl-devel-cos6-x86_64 conda install conda-forge::mesa-libgl-devel-cos6-x86_64

Now I have another problem that appears even when replacing RasterizeGLContext with RasterizeCudaContext too: cannot find -lcudart

iliagrigorevdev commented 4 months ago

The problem with "cannot find -lcudart" solved too.

It was a broken symlink libcudart.so -> libcudart.so.12.1.55.

Because of conflicts I couldn't install conda install nvidia/label/cuda-12.1.1::cuda-toolkit and I installed another version conda install nvidia/label/cuda-12.1.0::cuda-toolkit which had broken link to libcudart.so

To fix this I upgraded only cudart to 12.1.1 conda install nvidia/label/cuda-12.1.1::cuda-cudart-dev conda install nvidia/label/cuda-12.1.1::cuda-cudart-static

iliagrigorevdev commented 4 months ago

The last problem I encountered while compiling ComfyUI-3D-Pack:

ctime:80:11: error: 'timespec_get' has not been declared in '::'
80 |   using ::timespec_get;
   |           ^~~~~~~~~~~~
ninja: build stopped: subcommand failed.

And the fix was to use conda gcc: conda install conda-forge::gcc

draconidsmxz commented 4 months ago

The problem with "cannot find -lcudart" solved too.

It was a broken symlink libcudart.so -> libcudart.so.12.1.55.

Because of conflicts I couldn't install conda install nvidia/label/cuda-12.1.1::cuda-toolkit and I installed another version conda install nvidia/label/cuda-12.1.0::cuda-toolkit which had broken link to libcudart.so

To fix this I upgraded only cudart to 12.1.1 conda install nvidia/label/cuda-12.1.1::cuda-cudart-dev conda install nvidia/label/cuda-12.1.1::cuda-cudart-static

If you still can't have "cannot find -ldcudart" problem after using conda install nvidia/label/cuda-12.1.1::cuda-toolkit

I find out in my error message that "... -L/usr/local/cuda-11.3/lib64 ..", ninja is trying to find libcudart.so in lib64 folder but in my installation, my libcudart.* is in lib folder.

So my solution is just create a lib64 folder in my env conda env and copy libcudart.* from lib to lib64 and the issue is solved.

dawei03896 commented 2 months ago

It seems that I have solved this problem:don't modify " ['ninja', '-v']", and replace "RasterizeGLContext" with "RasterizeCudaContext" in your project.

Thank you very much, I also used this method to solve it.

JimWang151 commented 2 months ago

It seems that I have solved this problem:don't modify " ['ninja', '-v']", and replace "RasterizeGLContext" with "RasterizeCudaContext" in your project.

Thank you very much, I also used this method to solve it.

兄弟,在哪个文件里面替换?