Build issue - Githubissues

KarimJedda commented 1 year ago

Any ideas what could cause this?

/workspace/gaussian-splatting-cuda/src/gaussian.cu(70): error: too many arguments in function call
      _scaling = torch::log(torch::sqrt(dist2)).unsqueeze(-1).repeat({1, 3}, 0);
                                                                             ^

/workspace/gaussian-splatting-cuda/src/gaussian.cu(223): error: too many arguments in function call
      torch::Tensor stds = Get_scaling().index_select(0, indices).repeat({N, 1}, 0);
                                                                                 ^

/workspace/gaussian-splatting-cuda/src/gaussian.cu(226): error: too many arguments in function call
      torch::Tensor rots = build_rotation(_rotation.index_select(0, indices)).repeat({N, 1, 1}, 0);
                                                                                                ^

/workspace/gaussian-splatting-cuda/src/gaussian.cu(228): error: too many arguments in function call
      torch::Tensor new_xyz = torch::bmm(rots, samples.unsqueeze(-1)).squeeze(-1) + _xyz.index_select(0, indices).repeat({N, 1}, 0);
                                                                                                                                 ^

/workspace/gaussian-splatting-cuda/src/gaussian.cu(229): error: too many arguments in function call
      torch::Tensor new_scaling = torch::log(Get_scaling().index_select(0, indices).repeat({N, 1}, 0) / (0.8 * N));
                                                                                                   ^

/workspace/gaussian-splatting-cuda/src/gaussian.cu(230): error: too many arguments in function call
      torch::Tensor new_rotation = _rotation.index_select(0, indices).repeat({N, 1}, 0);
                                                                                     ^

/workspace/gaussian-splatting-cuda/src/gaussian.cu(231): error: too many arguments in function call
      torch::Tensor new_features_dc = _features_dc.index_select(0, indices).repeat({N, 1, 1}, 0);
                                                                                              ^

/workspace/gaussian-splatting-cuda/src/gaussian.cu(232): error: too many arguments in function call
      torch::Tensor new_features_rest = _features_rest.index_select(0, indices).repeat({N, 1, 1}, 0);
                                                                                                  ^

/workspace/gaussian-splatting-cuda/src/gaussian.cu(233): error: too many arguments in function call
      torch::Tensor new_opacity = _opacity.index_select(0, indices).repeat({N, 1}, 0);

MrNeRF commented 1 year ago

Did you download the libtorch version as specified in the Readme.md? Seems to be a libtorch issue.

KarimJedda commented 1 year ago

Tried with the one specified in the Readme and got the same issue. Now trying with the nightly one. The .repeat() method seems to take only one parameter in the C++ implementation of libtorch so I can't really pinpoint the issue.

I'm tracking my progress over here: https://github.com/KarimJedda/gaussian-splatting-cuda#install-from-scratch trying to start from a "vanilla" cuda container.

MrNeRF commented 1 year ago

Now I have access to my computer and could reproduce it. I am wondering why? This might have something to do with my cache. I am on it... Thanks for pointing it out.

KarimJedda commented 1 year ago

No problem at all! I managed to fix it from my side and I'm able to build everything.

Screenshot from 2023-08-12 15-16-32

If you like me to make a PR, please let me know I'm happy to contribute.

KarimJedda commented 1 year ago

However, I must have fumbled something in the process (i'm on a 4090 RTX)

root@b58b9f4bd399:~/gaussian-splatting-cuda# ./build/gaussian_splatting_cuda dataset/tandt/truck/
Output folder: /root/gaussian-splatting-cuda/output
tinyply exception: the following property keys were not found in the header: nx, ny, nz, 
    Read 136029 total vertices 
    Read 136029 total vertex colors 
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: forward compatibility was attempted on non supported HW
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f6f05544a9b in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xbf (0x7f6f0553f64f in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x58f (0x7f6f04f66cdf in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10_cuda.so)
frame #3: c10::cuda::CUDAKernelLaunchRegistry::CUDAKernelLaunchRegistry() + 0xd6 (0x7f6f04f65846 in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10_cuda.so)
frame #4: c10::cuda::CUDAKernelLaunchRegistry::get_singleton_ref() + 0x44 (0x7f6f04f65a54 in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10_cuda.so)
frame #5: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x75 (0x7f6f04f667c5 in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10_cuda.so)
frame #6: <unknown function> + 0x2388e (0x7f6f04f3888e in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10_cuda.so)
frame #7: at::native::to(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, bool, c10::optional<c10::MemoryFormat>) + 0x255 (0x7f6ef03d6205 in /root/gaussian-splatting-cuda/external/libtorch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x288c8a9 (0x7f6ef13378a9 in /root/gaussian-splatting-cuda/external/libtorch/lib/libtorch_cpu.so)
frame #9: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, bool, c10::optional<c10::MemoryFormat>) + 0x215 (0x7f6ef0ade595 in /root/gaussian-splatting-cuda/external/libtorch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x4c83b (0x560a33b0183b in ./build/gaussian_splatting_cuda)
frame #11: <unknown function> + 0x4cb29 (0x560a33b01b29 in ./build/gaussian_splatting_cuda)
frame #12: <unknown function> + 0x4d7aa (0x560a33b027aa in ./build/gaussian_splatting_cuda)
frame #13: <unknown function> + 0x4e488 (0x560a33b03488 in ./build/gaussian_splatting_cuda)
frame #14: <unknown function> + 0x1d776 (0x560a33ad2776 in ./build/gaussian_splatting_cuda)
frame #15: __libc_start_main + 0xf3 (0x7f6e9700c083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #16: <unknown function> + 0x2246e (0x560a33ad746e in ./build/gaussian_splatting_cuda)

Aborted (core dumped)

so I'd rather wait for your assessment.

MrNeRF commented 1 year ago

I believe I've upgraded the libtorch version, and I might not have deleted the build folder afterward. It's possible that CMake cached something, which obscured the fact that the latest version isn't compatible with my current implementation. I think I've resolved the issue. I'll make a clean checkout and test it again. If you'd like, you can also try. Just pull the latest changes.

Btw, contributions are very welcome. The Readme as well needs some polishing. It is good that someone is testing my implementation. Thank you

MrNeRF commented 1 year ago

A clean checkout builds now properly for me and the training runs as expected! Can you also confirm?

KarimJedda commented 1 year ago

I'll try it right now and let you know.

KarimJedda commented 1 year ago

Amazing! Thank you very much for the fix.

I confirm that it's building properly now
It's currently training on an RTX A5000

Iteration: 6993 Loss: 0.0541091 gaussian splats: 1649245
Iteration: 6994 Loss: 0.072882 gaussian splats: 1649245
Iteration: 6995 Loss: 0.065998 gaussian splats: 1649245
Iteration: 6996 Loss: 0.0602186 gaussian splats: 1649245
Iteration: 6997 Loss: 0.0601465 gaussian splats: 1649245
Iteration: 6998 Loss: 0.0837042 gaussian splats: 1649245
Iteration: 6999 Loss: 0.0722159 gaussian splats: 1649245
Iteration: 7000 Loss: 0.0594181 gaussian splats: 1649245

I'll submit a proposal for the Readme in the coming days once I tested it a little bit more.

MrNeRF / gaussian-splatting-cuda

Build issue #1