Closed shaune0000 closed 3 months ago
Hi,
Could you please provide the screenshot of output/log? Because in my machine, directly running cache_warmup.ipynb
without any modification looks good to me (the cache_F_4.json and point_cloud.ply are already in this repo).
Thanks for reply.
I running on windows with env: python 3.11 torch 2.3.0+cu118 latest version of tinycudann
I got this while running.
My env: OS: Ubuntu 22.04 GPU: RTX A6000 Driver: 535.86.05 CUDA: 11.8 Python: 3.8 Pytorch: 2.0.1+cu118 tinycudann: 1.7
I apologize for the inconvenience, but this issue might stem from the implementation of tinycudann or an environment mismatch. FYI, some researchers have successfully re-conduct the experiments from the pre-release code, others have not encountered the NaN issue, though they have faced other problems (for instance, see: https://github.com/SJoJoK/3DGStream/issues/7).
For debugging purposes, I recommend printing the inputs, outputs, and losses (loss_xyz, loss_rot, and loss_dummy). The appearance of NaN outputs is often linked to NaN values in losses or inputs, which can disrupt gradient-based optimization. Identifying the source of the NaN values can significantly simplify debugging. My personal practice is to print the sum of each tensor (e.g., masked_d_xyz.sum(), masked_d_rot.sum()) so that you can quickly identify any NaN values present.
Thank you for your reply. I think the problem has become clear after conducting some tests.
On Windows with PyTorch 2.0.1, it is possible to train NTC, but an issue arises with Gaussian rasterization, resulting in a DLL load failure while trying to train the Gaussian model. This problem is resolved in PyTorch 2.3.0, but somehow, the tinycudann network does not work when training NTC.
Therefore, sticking to the environment you mentioned will be okay.
Glad to help:)
Hi, there,
Thank you for the clear code and detailed steps provided. However, while running the test data through the NTC warmup section, I encountered an issue where all values returned by the tcnn model were NaNs starting from the second iteration. I am currently using the flame steak dataset for testing. Could I have missed something in the process?