Closed KarimJedda closed 1 year ago
Did you download the libtorch version as specified in the Readme.md? Seems to be a libtorch issue.
Tried with the one specified in the Readme and got the same issue. Now trying with the nightly one. The .repeat()
method seems to take only one parameter in the C++ implementation of libtorch so I can't really pinpoint the issue.
I'm tracking my progress over here: https://github.com/KarimJedda/gaussian-splatting-cuda#install-from-scratch trying to start from a "vanilla" cuda container.
Now I have access to my computer and could reproduce it. I am wondering why? This might have something to do with my cache. I am on it... Thanks for pointing it out.
No problem at all! I managed to fix it from my side and I'm able to build everything.
If you like me to make a PR, please let me know I'm happy to contribute.
However, I must have fumbled something in the process (i'm on a 4090 RTX)
root@b58b9f4bd399:~/gaussian-splatting-cuda# ./build/gaussian_splatting_cuda dataset/tandt/truck/
Output folder: /root/gaussian-splatting-cuda/output
tinyply exception: the following property keys were not found in the header: nx, ny, nz,
Read 136029 total vertices
Read 136029 total vertex colors
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: forward compatibility was attempted on non supported HW
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Device-side assertions were explicitly omitted for this error check; the error probably arose while initializing the DSA handlers.
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:44 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f6f05544a9b in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xbf (0x7f6f0553f64f in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x58f (0x7f6f04f66cdf in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10_cuda.so)
frame #3: c10::cuda::CUDAKernelLaunchRegistry::CUDAKernelLaunchRegistry() + 0xd6 (0x7f6f04f65846 in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10_cuda.so)
frame #4: c10::cuda::CUDAKernelLaunchRegistry::get_singleton_ref() + 0x44 (0x7f6f04f65a54 in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10_cuda.so)
frame #5: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x75 (0x7f6f04f667c5 in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10_cuda.so)
frame #6: <unknown function> + 0x2388e (0x7f6f04f3888e in /root/gaussian-splatting-cuda/external/libtorch/lib/libc10_cuda.so)
frame #7: at::native::to(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, bool, c10::optional<c10::MemoryFormat>) + 0x255 (0x7f6ef03d6205 in /root/gaussian-splatting-cuda/external/libtorch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x288c8a9 (0x7f6ef13378a9 in /root/gaussian-splatting-cuda/external/libtorch/lib/libtorch_cpu.so)
frame #9: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, bool, bool, c10::optional<c10::MemoryFormat>) + 0x215 (0x7f6ef0ade595 in /root/gaussian-splatting-cuda/external/libtorch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x4c83b (0x560a33b0183b in ./build/gaussian_splatting_cuda)
frame #11: <unknown function> + 0x4cb29 (0x560a33b01b29 in ./build/gaussian_splatting_cuda)
frame #12: <unknown function> + 0x4d7aa (0x560a33b027aa in ./build/gaussian_splatting_cuda)
frame #13: <unknown function> + 0x4e488 (0x560a33b03488 in ./build/gaussian_splatting_cuda)
frame #14: <unknown function> + 0x1d776 (0x560a33ad2776 in ./build/gaussian_splatting_cuda)
frame #15: __libc_start_main + 0xf3 (0x7f6e9700c083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #16: <unknown function> + 0x2246e (0x560a33ad746e in ./build/gaussian_splatting_cuda)
Aborted (core dumped)
so I'd rather wait for your assessment.
I believe I've upgraded the libtorch version, and I might not have deleted the build folder afterward. It's possible that CMake cached something, which obscured the fact that the latest version isn't compatible with my current implementation. I think I've resolved the issue. I'll make a clean checkout and test it again. If you'd like, you can also try. Just pull the latest changes.
Btw, contributions are very welcome. The Readme as well needs some polishing. It is good that someone is testing my implementation. Thank you
A clean checkout builds now properly for me and the training runs as expected! Can you also confirm?
I'll try it right now and let you know.
Amazing! Thank you very much for the fix.
Iteration: 6993 Loss: 0.0541091 gaussian splats: 1649245
Iteration: 6994 Loss: 0.072882 gaussian splats: 1649245
Iteration: 6995 Loss: 0.065998 gaussian splats: 1649245
Iteration: 6996 Loss: 0.0602186 gaussian splats: 1649245
Iteration: 6997 Loss: 0.0601465 gaussian splats: 1649245
Iteration: 6998 Loss: 0.0837042 gaussian splats: 1649245
Iteration: 6999 Loss: 0.0722159 gaussian splats: 1649245
Iteration: 7000 Loss: 0.0594181 gaussian splats: 1649245
I'll submit a proposal for the Readme in the coming days once I tested it a little bit more.
Any ideas what could cause this?