NVlabs / nvdiffrast

Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering
Other
1.31k stars 139 forks source link

Runtime Error: glLinkProgram() failed #50

Closed Mirocos closed 2 years ago

Mirocos commented 2 years ago

Hi, I tried to use nvdiffrast with the mentioned document in Windows 10. When I executed following commad, runtime error happened:

D:\development\anaconda3\envs\dmodel\lib\site-packages\torch\utils\cpp_extension.py:304: UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte
  warnings.warn(f'Error checking compiler version for {compiler}: {error}')
Traceback (most recent call last):
  File ".\samples\torch\cube.py", line 200, in <module>
    main()
  File ".\samples\torch\cube.py", line 191, in main
    mp4save_fn='progress.mp4'
  File ".\samples\torch\cube.py", line 76, in fit_cube
    glctx = dr.RasterizeGLContext()
  File "D:\development\anaconda3\envs\dmodel\lib\site-packages\nvdiffrast\torch\ops.py", line 151, in __init__
    self.cpp_wrapper = _get_plugin().RasterizeGLStateWrapper(output_db, mode == 'automatic', cuda_device_idx)
RuntimeError: glLinkProgram() failed:
Fragment info
-------------
0(2) : error C7528: OpenGL reserves names starting with 'gl_'
(0) : error C2003: incompatible options for link

my PyOpenGL version is 3.1.5 and glfw version is 2.3.0.

s-laine commented 2 years ago

Hi @Mirocos - it looks like the OpenGL driver refuses to compile the shader program, and this is something I haven't seen before. Can you try adding dr.set_log_level(0) at the beginning of main function? This should give a bit more log output showing the OpenGL version and possibly further info.

Mirocos commented 2 years ago

Hi, with dr.set_log_level(0),here is the log info:

[I D:\development\anaconda3\envs\dmodel\lib\site-packages\nvdiffrast\common\glutil.cpp:127] Selecting device with PCI bus id 0000:2B:00.0 - failed, expect crash or major slowdown
[I D:\development\anaconda3\envs\dmodel\lib\site-packages\nvdiffrast\common\glutil.cpp:164] WGL OpenGL context created (hdc: 0x6a0111f7, hglrc: 0x00010000)
[I D:\development\anaconda3\envs\dmodel\lib\site-packages\nvdiffrast\common\rasterize.cpp:103] OpenGL version reported as 4.6
s-laine commented 2 years ago

The inability to select GPU device is somewhat suspicious but should not matter if you have only one GPU. Are you perhaps on a laptop with both an integrated and a discrete GPU? Do you have up-to-date NVIDIA GPU drivers? There is something unusual about your setup as I've never heard of this problem before.

This is a long shot, but you could try modifying nvdiffrast/common/rasterize.cpp by removing all lines that have in int gl_... in them. This may remove the shader compilation error C7528, but I have a feeling that something else will fail instead then.

Mirocos commented 2 years ago

No, I am on a desktop with RTX3070Ti, and my NVIDIA GPU drivers is 496.13(may be the lateset version)

Mirocos commented 2 years ago

Hi, @s-laine. By removing all lines that have in int gl_xxx varying variables, it works well on my machine, thanks for your help!!!

By the way, is there any plan to produce a more detailed document? I am really new to differentiable rendering, or any advice to a good tutorial?

s-laine commented 2 years ago

Great to hear that it works with the modifications! I should probably make that change in the repo too, now that I know there are drivers that refuse to compile the shaders otherwise. I'll leave this issue open until the fix is in.

There was a recent tutorial on differentiable rendering in CVPR 2021. I haven't watched it myself, but it apparently focuses heavily on physically accurate path tracing -based differentiable rendering, making it somewhat inapplicable to the rasterization-based rendering that nvdiffrast supports. Rasterization-based rendering is much faster but cannot simulate light transport properly, so you need to consider your application when deciding which way to go.

In addition to the nvdiffrast documentation, you can check out our paper Modular Primitives for High-Performance Differentiable Rendering. The paper focuses mostly on the design and benchmarking of our library, but you will probably find the introductory parts useful, and possibly some of the explanations of the sample applications as well.

For an advanced application, see our follow-up paper Appearance-Driven Automatic 3D Model Simplification that infers mesh geometry, textures, etc., from rendered images. There is also a related blog post.

ladzin commented 2 years ago

I stumbled upon the same issue and Google brought me here :) What I'm getting is:

[I C:\Users\lkavan\Anaconda3\envs\tst\lib\site-packages\nvdiffrast\common\glutil.cpp:119] Creating GL context for Cuda device 0
[I C:\Users\lkavan\Anaconda3\envs\tst\lib\site-packages\nvdiffrast\common\glutil.cpp:127] Selecting device with PCI bus id 0000:61:00.0 - success
[I C:\Users\lkavan\Anaconda3\envs\tst\lib\site-packages\nvdiffrast\common\glutil.cpp:164] WGL OpenGL context created (hdc: 0x460126a3, hglrc: 0x00010000)
[I C:\Users\lkavan\Anaconda3\envs\tst\lib\site-packages\nvdiffrast\common\rasterize.cpp:103] OpenGL version reported as 4.6
Traceback (most recent call last):
  File "samples\torch\triangle.py", line 23, in <module>
    glctx = dr.RasterizeGLContext()
  File "C:\Users\lkavan\Anaconda3\envs\tst\lib\site-packages\nvdiffrast\torch\ops.py", line 151, in __init__
    self.cpp_wrapper = _get_plugin().RasterizeGLStateWrapper(output_db, mode == 'automatic', cuda_device_idx)
RuntimeError: glLinkProgram() failed:
Fragment info
-------------
0(2) : error C7528: OpenGL reserves names starting with 'gl_'
(0) : error C2003: incompatible options for link
ladzin commented 2 years ago

And yes, commenting out all lines that contain in int gl_xxx fixed the problem

Mirocos commented 2 years ago

@ladzin Hi, you can try to delete the in int gl_xxx build-in variables in rasterize.cpp, it works for me.

s-laine commented 2 years ago

Hi @ladzin, could you let me know which graphics driver version you have installed in your system? Based on report from @Mirocos, 496.13 has this issue but there is a more recent one too (496.49) and it'd be nice to know if the bug/feature exists in that still.

ladzin commented 2 years ago

Hi @s-laine, my driver version is 496.13. I will try the newer one.

ladzin commented 2 years ago

Aha, that was it! I upgraded my A6000 driver to the latest (472.39) and the original rasterize.cpp works just fine. So I guess this must have been some bug with the 496.13 driver.

s-laine commented 2 years ago

Big thanks for testing this! However, given that the 496.13 driver is out there, I'll try to push a fix into the shaders tomorrow or early next week, just in case.

Mirocos commented 2 years ago

Hi, @s-laine, it seems that with 496.71 driver we don't need to comment out the variables in int gl_xxx in shaders.

s-laine commented 2 years ago

Thanks again @Mirocos and @ladzin for the help in testing. The offending declarations are now removed in v0.2.7, so hopefully that's it for this bug. Closing.

geek0075 commented 2 years ago

Hi,

I got the following exception trying to run:

$ python test.py --name=model1 --epoch=20 --img_folder=./datasets/mydata

Driver Provider: NVIDIA Driver Version: 30.0.15.1179 Driver Date: 2/10/2022 Digital Signer: Microsoft Windows Hardware Compatibility Publisher

This returns an error:

Error C7528: OpenGL reserves names starting with 'gl_': gl_PrimitiveID Error C2003: Incompatible options for Link

Has anyone else experienced this? I am running on Windows with NVIDIA GeForce 3090.

Kind Regards.

geek0075 commented 2 years ago

@s-laine

Xiaoming-Zhao commented 2 years ago

Hi @s-laine, just a heads up here. I encounter a similar issue with GPU driver 515.43.04. Commenting out all in int gl_ as you suggested fix it.

Traceback (most recent call last):
  File "test_gmpi.py", line 154, in <module>
    main(0, opt, opt.gmpi_img_root, opt.gmpi_depth_root, opt.gmpi_detect_root)
  File "test_gmpi.py", line 101, in main
    model.test()           # run inference
  File "/data1/xz23/gmpi/code/private_Deep3DFaceRecon_pytorch/models/base_model.py", line 162, in test
    self.forward()
  File "/data1/xz23/gmpi/code/private_Deep3DFaceRecon_pytorch/models/facerecon_model.py", line 139, in forward
    self.pred_vertex, self.facemodel.face_buf, feat=self.pred_color)
  File "/home/xz23/Python/miniconda3/envs/deep3d_pytorch_new/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data1/xz23/gmpi/code/private_Deep3DFaceRecon_pytorch/util/nvdiffrast.py", line 58, in forward
    self.glctx = dr.RasterizeGLContext(device=device)
  File "/home/xz23/Python/miniconda3/envs/deep3d_pytorch_new/lib/python3.6/site-packages/nvdiffrast/torch/ops.py", line 151, in __init__
    self.cpp_wrapper = _get_plugin().RasterizeGLStateWrapper(output_db, mode == 'automatic', cuda_device_idx)
RuntimeError: glLinkProgram() failed:
Fragment info
-------------
0(2) : error C7528: OpenGL reserves names starting with 'gl_': gl_PrimitiveID
(0) : error C2003: incompatible options for link
s-laine commented 2 years ago

Hi @Xiaoming-Zhao, the current version shouldn't have any int gl_ variables declared anymore. However, I apparently made the same mistake again with the A100 workaround having in vec4 gl_FragCoord and out float gl_FragDepth declarations. Was it these lines you had to remove? In any case, I'll delete them in the next release.

PoopBear1 commented 1 year ago

Same question happends on version 515.65.01, even though I delete the declaration int gl_.

GLSL Shader compilation failed: : 0(8) : error C7528: OpenGL reserves names starting with 'gl_': gl_PrimitiveID 0(11) : warning C7555: 'varying' is deprecated, use 'in/out' instead

s-laine commented 1 year ago

Currently nvdiffrast does not have any 'gl_' declarations in the shader code. Are you using an older version of nvdiffrast? First of all, check here for a case where a deprecated version was bundled in another package, causing similar issues.

If that is not the case, the GLSL compiler error doesn't make sense if there are no declarations for gl_ variables. Maybe the C++ plugin was not rebuilt for one reason or another? To be certain, see what torch.utils.cpp_extension._get_build_directory('nvdiffrast_plugin', False) says and delete that directory.

As a side note, the shader code has never included keyword 'varying', so I don't know what's up with the warning.