NVIDIAGameWorks / kaolin-wisp

NVIDIA Kaolin Wisp is a PyTorch library powered by NVIDIA Kaolin Core to work with neural fields (including NeRFs, NGLOD, instant-ngp and VQAD).
Other
1.45k stars 133 forks source link

Interactive Training Crashes Immediately #101

Open saltwick opened 1 year ago

saltwick commented 1 year ago

This might be more of an OpenGL setup issue, but it's only occuring for the interactive rendering. I can use the nerf app fine in headless mode, but when I try to use the GUI I get the following error.

X Error of failed request:  BadWindow (invalid Window parameter)
  Major opcode of failed request:  150 (GLX)
  Minor opcode of failed request:  16 (X_GLXVendorPrivate)
  Resource id in failed request:  0x2c00009
  Serial number of failed request:  0
  Current serial number in output stream:  152

I followed the solution in #66 for modifying the window config for the correct openGL version and that got me a step further. I added

config = app.configuration.Configuration()
config.major_version=3
config.minor_version=2
config.profile='core'
window = app.Window(..., config=config)

to wisp/cuda_guard.py and then I encountered another issue where make_default_context() wasn't able to create a context on any of the 1 detected devices and solved that by including __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia before running the python script.

Now when I run the script, a transparent window pops up, the data is loaded, and training starts but immediately crashes with the following error:

[i] Using PYGLFW_IMGUI (GL 2.1)
2023-01-11 19:37:33,079|    INFO| [i] Using PYGLFW_IMGUI (GL 2.1)
[i] Running at 60 frames/second
2023-01-11 19:37:33,111|    INFO| [i] Running at 60 frames/second
Traceback (most recent call last):
  File "app/nerf/main_nerf.py", line 490, in <module>
    app.run()  # Run in interactive mode
  File "/home/ubuntu/nr/kaolin-wisp/wisp/renderer/app/wisp_app.py", line 248, in run
    app.run()   # App clock should always run as frequently as possible (background tasks should not be limited)
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/app/__init__.py", line 362, in run
    run(duration, framecount)
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/app/__init__.py", line 344, in run
    count = __backend__.process(dt)
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/app/window/backends/backend_glfw_imgui.py", line 448, in process
    window.dispatch_event('on_draw', dt)
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/app/window/event.py", line 396, in dispatch_event
    if getattr(self, event_type)(*args):
  File "/home/ubuntu/nr/kaolin-wisp/wisp/renderer/app/wisp_app.py", line 527, in on_draw
    self.render()     # Render objects uploaded to GPU
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/nr/kaolin-wisp/wisp/renderer/app/wisp_app.py", line 499, in render
    self._blit_to_gl_renderbuffer(img, depth_img, self.canvas_program, self.cuda_buffer,
  File "/home/ubuntu/nr/kaolin-wisp/wisp/renderer/app/wisp_app.py", line 414, in _blit_to_gl_renderbuffer
    canvas_program.draw(gl.GL_TRIANGLE_STRIP)
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/gloo/program.py", line 603, in draw
    self.activate()
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/gloo/globject.py", line 95, in activate
    self._activate()
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/gloo/program.py", line 393, in _activate
    attribute.activate()
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/gloo/globject.py", line 95, in activate
    self._activate()
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/glumpy/gloo/variable.py", line 383, in _activate
    gl.glVertexAttribPointer(self.handle, size, gtype, gl.GL_FALSE, stride, offset)
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/OpenGL/latebind.py", line 63, in __call__
    return self.wrapperFunction( self.baseFunction, *args, **named )
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/OpenGL/GL/VERSION/GL_2_0.py", line 470, in glVertexAttribPointer
    return baseOperation(
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/OpenGL/latebind.py", line 43, in __call__
    return self._finalCall( *args, **named )
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/OpenGL/wrapper.py", line 1392, in wrapperCall
    raise err
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/OpenGL/wrapper.py", line 1385, in wrapperCall
    result = wrappedOperation( *cArguments )
  File "/opt/conda/envs/wisp/lib/python3.8/site-packages/OpenGL/error.py", line 230, in glCheckError
    raise self._errorClass(
OpenGL.error.GLError: GLError(
    err = 1282,
    description = b'invalid operation',
    baseOperation = glVertexAttribPointer,
    pyArgs = (
        0,
        2,
        GL_FLOAT,
        GL_FALSE,
        16,
        c_void_p(None),
    ),
    cArgs = (
        0,
        2,
        GL_FLOAT,
        GL_FALSE,
        16,
        c_void_p(None),
    ),
    cArguments = (
        0,
        2,
        GL_FLOAT,
        GL_FALSE,
        16,
        c_void_p(None),
    )
)

Has anyone else encountered this?

Setup:

OpenGL version string: 3.1 Mesa 21.2.6 OpenGL shading language version string: 1.40 OpenGL context flags: (none)

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 21.2.6 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20



`glxgears` works completely fine. I can also get the instant-ngp GUI up, but I'm unable to interactively train a model there for other reasons. 
orperel commented 1 year ago

Hi @saltwick, this one indeed sounds an opengl setup issue (The make_default_context() wasn't able to create a context on any of the 1 detected devices is a strong evidence)

If the answer to both is YES, my next suggestion is to re-install the conda env carefully. We have a pending PR which simplifies the installation, could you give it a try and see if it helps? (you no longer have to build pycuda manually, Wisp is pip installable now): https://github.com/NVIDIAGameWorks/kaolin-wisp/pull/105

EDIT: this PR have been merged into main now

orperel commented 1 year ago

Hi again @saltwick! Looking again at #66 it just dawned on me that simply changing the major / minor version in glumpy's app_config.py doesn't actually fix the issue, as the backend ignores the requested versions.

I've issued a new fix with #117, wisp sets the default GL version to 3.3 now. If needed it's also configurable via WispState's renderer.gl_version field (normally you shouldn't worry about that)