Issues running the example code (Cuda error: 304[cudaGraphicsGLRegisterBuffer...)

rpapallas commented 11 months ago

Hello,

Thank you for sharing this work!

I am trying to get this to work on my machine and I get the following error:

Loading extension module renderutils_plugin...
  0%|                                                                                                    | 0/61 [00:06<?, ?it/s]
Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/rafael/diff-dope/examples/simple_scene.py", line 17, in main
    ddope.run_optimization()
  File "/home/rafael/diff-dope/diffdope/diffdope.py", line 1693, in run_optimization
    self.renders = render_texture_batch(
  File "/home/rafael/diff-dope/diffdope/diffdope.py", line 198, in render_texture_batch
    rast_out, rast_out_db = dr.rasterize(
  File "/home/rafael/.local/lib/python3.10/site-packages/nvdiffrast/torch/ops.py", line 310, in rasterize
    return _rasterize_func.apply(glctx, pos, tri, resolution, ranges, grad_db, -1)
  File "/home/rafael/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/rafael/.local/lib/python3.10/site-packages/nvdiffrast/torch/ops.py", line 246, in forward
    out, out_db = _get_plugin(gl=True).rasterize_fwd_gl(raster_ctx.cpp_wrapper, pos, tri, resolution, ranges, peeling_idx)
RuntimeError: Cuda error: 304[cudaGraphicsGLRegisterBuffer(&s.cudaPosBuffer, s.glPosBuffer, cudaGraphicsRegisterFlagsWriteDiscard);]

I have the following set up:

Ubuntu 20.04
NVidia 535.129.03
CUDA 12.2
NVIDIA GeForce RTX 4090

I had a look around, and it seems to be a conflict between CUDA and OpenGL. I feel that RTX 4090 is a good one, and it should have worked. Which Ubuntu version are you working with?

TontonTremblay commented 11 months ago

I used a 4090 but with 1804 and a 3090 with 2004 without any issues. I think your problem is more related to nvdiffrast.

https://github.com/NVlabs/nvdiffrast

I am currently traveling right now so I cannot play around with the code on other configuration. https://github.com/NVlabs/nvdiffrast/issues/131 I found this issue, I dont think I used cuda 12.2 but 11. something. Let me know if you try cuda 11. something and it solved your problem. I remember reading that nvdiffrast can run without opengl backend, but I did not explored it, maybe this could be an other solution.

https://github.com/NVlabs/nvdiffrast/blob/main/samples/torch/pose.py#L164

This could be a simple fix when the context is created. Hope this helps, once I get home I will push a opengl config.

rpapallas commented 10 months ago

Hi @TontonTremblay,

Thank you for getting back to me. Interestingly, when installing CUDA 11.8 I get the following error:

Traceback (most recent call last):
  File "/home/rafael/diff-dope/examples/simple_scene.py", line 14, in main
    ddope = dd.DiffDope(cfg=cfg)
  File "<string>", line 9, in __init__
  File "/home/rafael/diff-dope/diffdope/diffdope.py", line 1312, in __post_init__
    self.object3d = Object3D(**self.cfg.object3d)
  File "/home/rafael/diff-dope/diffdope/diffdope.py", line 980, in __init__
    self.set_pose(
  File "/home/rafael/diff-dope/diffdope/diffdope.py", line 1036, in set_pose
    self.mesh.cuda()
  File "/home/rafael/diff-dope/diffdope/diffdope.py", line 913, in cuda
    vars(self)[key] = vars(self)[key].cuda()
  File "/home/rafael/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
    torch._C._cuda_init()
RuntimeError: The NVIDIA driver on your system is too old (found version 11080). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

It seems the PyTorch code used requires a more recent version of CUDA?

rpapallas commented 10 months ago

Hi @TontonTremblay,

I apologize for the follow-up messages, I think I managed to get closer but still face run time issues. Specifically:

rafael@server:~/diff-dope/home/diff-dope$ python3 examples/simple_scene.py

[2024-01-09 14:19:07,568][diffdope.diffdope][INFO] - loaded mesh @data/example/mesh/AlphabetSoup.ply. Does it have texture map? True
[2024-01-09 14:19:07,570][diffdope.diffdope][INFO] - translation loaded: [-1.6116878  -2.0622094  -7.47151334]
[2024-01-09 14:19:07,571][diffdope.diffdope][INFO] - rotation loaded as quaternion: [ 0.28427788 -0.34248786  0.88225564 -0.15333994]
[2024-01-09 14:19:07,705][diffdope.diffdope][INFO] - Loaded image data/example/scene/rgb.png, shape: torch.Size([540, 960, 3])
[2024-01-09 14:19:07,727][diffdope.diffdope][INFO] - Loaded image data/example/scene/depth.png, shape: torch.Size([540, 960])
[2024-01-09 14:19:07,749][diffdope.diffdope][INFO] - Loaded image data/example/scene/seg.png, shape: torch.Size([540, 960, 3])
[2024-01-09 14:19:07,994][diffdope.diffdope][INFO] - batchsize is 8
[2024-01-09 14:19:07,994][diffdope.diffdope][INFO] - Object3D(
 (pos): torch.Size([8]) ,[0]:[(-1.6116877794265747, -2.062209367752075, -7.471513271331787)] on cuda:0
 (mesh): mesh @data/example/mesh/AlphabetSoup.ply. vtx:torch.Size([8, 8240, 3]) on cuda:0 on cuda:0
)
[2024-01-09 14:19:07,995][diffdope.diffdope][INFO] - Scene(path_img='data/example/scene/rgb.png', path_depth='data/example/scene/depth.png', path_segmentation='data/example/scene/seg.png', image_resize=0.5, tensor_rgb=torch.Size([8, 540, 960, 3]) @ data/example/scene/rgb.png on cuda:0, tensor_depth=torch.Size([8, 540, 960]) @ data/example/scene/depth.png on cuda:0, tensor_segmentation=torch.Size([8, 540, 960, 3]) @ data/example/scene/seg.png on cuda:0)
  0%|                                                                                                                                                                               | 0/61 [00:00<?, ?it/s]Using /home/rafael/.cache/torch_extensions/py38_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/rafael/.cache/torch_extensions/py38_cu121/renderutils_plugin/build.ninja...
Building extension module renderutils_plugin...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /usr/bin/nvcc  -DTORCH_EXTENSION_NAME=renderutils_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/rafael/.local/lib/python3.8/site-packages/torch/include -isystem /home/rafael/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/rafael/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/rafael/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++17 -c /home/rafael/diff-dope/home/diff-dope/diffdope/c_src/mesh.cu -o mesh.cuda.o
FAILED: mesh.cuda.o
/usr/bin/nvcc  -DTORCH_EXTENSION_NAME=renderutils_plugin -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/rafael/.local/lib/python3.8/site-packages/torch/include -isystem /home/rafael/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/rafael/.local/lib/python3.8/site-packages/torch/include/TH -isystem /home/rafael/.local/lib/python3.8/site-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -DNVDR_TORCH -std=c++17 -c /home/rafael/diff-dope/home/diff-dope/diffdope/c_src/mesh.cu -o mesh.cuda.o
nvcc fatal   : Value 'c++17' is not defined for option 'std'
ninja: build stopped: subcommand failed.
  0%|                                                                                                                                                                               | 0/61 [00:00<?, ?it/s]
Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/rafael/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "examples/simple_scene.py", line 17, in main
    ddope.run_optimization()
  File "/home/rafael/diff-dope/home/diff-dope/diffdope/diffdope.py", line 1693, in run_optimization
    self.renders = render_texture_batch(
  File "/home/rafael/diff-dope/home/diff-dope/diffdope/diffdope.py", line 196, in render_texture_batch
    pos_clip_ja = dd.xfm_points(pos.contiguous(), final_mtx_proj)
  File "/home/rafael/diff-dope/home/diff-dope/diffdope/ops.py", line 143, in xfm_points
    out = _xfm_func.apply(points, matrix, True)
  File "/home/rafael/.local/lib/python3.8/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/rafael/diff-dope/home/diff-dope/diffdope/ops.py", line 109, in forward
    return _get_plugin().xfm_fwd(points, matrix, isPoints, False)
  File "/home/rafael/diff-dope/home/diff-dope/diffdope/ops.py", line 83, in _get_plugin
    torch.utils.cpp_extension.load(
  File "/home/rafael/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1308, in load
    return _jit_compile(
  File "/home/rafael/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1710, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/home/rafael/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1823, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/home/rafael/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'renderutils_plugin'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

This is one is on a server with a 4090 card and driver 545.23.08, CUDA 12.3.

We have another machine with a 3090 card, and I get the same error now. That machine is also 20.04 with a driver 525.147.05, CUDA 12.0.

TontonTremblay commented 10 months ago

Sorry for not replying to the previous message, I was traveling.

TontonTremblay commented 10 months ago

I think the problem comes from the code we write in cuda to run torch.matmul faster.

https://github.com/NVlabs/diff-dope/blob/main/diffdope/diffdope.py#L196

Can you try replacing the dd.xfm_points(pos.contiguous(), final_mtx_proj) to torch.matmul(pos, final_mtx_proj)?

It is also call for the depth, https://github.com/NVlabs/diff-dope/blob/main/diffdope/diffdope.py#L208

rpapallas commented 10 months ago

Thanks, this gives an error about the size of the tensor saying "Expected size for first two dimensions of batch2 tensor to be [8, 3] but got [8, 4].". I tried to use the pos.contiguous() in matmul call but gives the same error.

TontonTremblay commented 10 months ago

Yeah this is just the wrong transpose, sorry I am not in front of a machine that I can debug easily.

I think you would want something like that

posw = torch.cat([pos, torch.ones([pos.shape[0],pos.shape[1], 1]).cuda()], axis=2)
pos_clip_ja = torch.matmul(posw,final_mtx_proj.transpose(1,2))

For depth I am not sure, when I get into the office today, I will take some time to add in the config removing using the optimization we did, so it is all pure torch. Sorry about that.

TontonTremblay commented 10 months ago

I think the problem after discussing with a colleague is that you do not have the right version of cuda.

Looking at the github issue, the first error is a standard nvdiffrast error. Switching to a RasterizeCudaContext or fix the installation (for example, looking at the nvdiffrast dockerfile for the required setup) should solve it. Looking at https://github.com/NVlabs/diff-dope/blob/main/diffdope/ops.py it is a minimal PyTorch cuda extension with nothing fancy in it, so as long as the user has the same cuda toolkit as the torch installation is using, that should be fine. You may want to add a blurb about that to the readme, like we do in nvdiffrec https://github.com/NVlabs/nvdiffrec?tab=readme-ov-file#one-time-setup-windows

this was his answer I hope it helps.

rpapallas commented 10 months ago

Hi Jonathan,

This helped a lot; thank you both for your time. Everything is running now. Looking forward to play with diff-dope! Thank you very much for this work and for taking the time to help the community.

Here are some notes that may help someone else in the future:

The initial problem with Cuda error and OpenGL seems to be related to the container I was trying to get diff-dope installed in. We use Singularity / Apptainer containers and for some reason the CUDA / OpenGL conflict happens only in the container. As a workaround I tried to run diff-dope on the host OS and that initial error went away (I will look in to what needs to be done to get diff-dope running in a Singularity container).
For the latter error, I was installing / uninstalling different NVIDIA drivers and CUDA toolkits to overcome some of the problems. It seems after your colleague comment (although not confident if this statement is true) that I had to reinstall torch after each driver / cuda installation. I just re-installed torch this morning and everything seems to work.
I got some errors related to EGL, which I had to install using sudo apt install libglfw3-dev libgles2-mesa-dev.

Hope this helps someone else too.

rpapallas commented 10 months ago

I also reverted the changes we did for the matrix multiplication, it seems that it wasn't the problem so the code that runs now is the original one without those modifications.

TontonTremblay commented 10 months ago

thank you for the notes, I will add them to the readme. I really appreciate that you did not give up on the issues, I really try to make my work as accessible as possible, so this is a bummer for me that you had these issues, but I also feel like that navigating CUDA + nvidia drivers is a mess that I have little impact on.

If you could share with me which version of cuda, drivers and pytorch you ended up using that would be helpful.

rpapallas commented 10 months ago

To be honest, I think this wasn't issue of the code but rather of NVIDIA drivers, CUDA, and PyTorch configuration on my side.

Here are the details:

NVIDIA driver 525.147.05
CUDA version: 12.0 (prebuilt with the driver?)
PyTorch 2.1.2
NVIDIA GeForce RTX 4090
Ubuntu 20.04

Thanks again.

NVlabs / diff-dope

Issues running the example code (Cuda error: 304[cudaGraphicsGLRegisterBuffer...) #2