This implementation has slightly different outputs than the original code

MrNeRF / gaussian-splatting-cuda

3D Gaussian Splatting, reimagined: Unleashing unmatched speed with C++ and CUDA from the ground up!

Other

909 stars 74 forks source link

This implementation has slightly different outputs than the original code #53

Closed kwea123 closed 10 months ago

kwea123 commented 10 months ago

Some other issues #51 #43 pointed out performance (psnr) and count difference w.r.t. the original repo. Since this repo has the same cuda interface for forward/backward, I tried to input the same tensors to both implementations and inspected the outputs (no training involved, just one forward call and one backward call).

It turns out that there is some numerical difference (a relative error of ~0.01). I didn't spot any obvious code-wise difference, I suspect it might be due to the intrinsics like __fmaf I referred to in another issue #36 , but I'm really not sure.

As a result, for people who want to use this repo to match official repo, there is need to manually tune the thresholds like gradient magnitude, etc. Otherwise you could end up with smaller count (e.g. #43 ) that results in inferior quality.

As this is only an implementation difference, not a code bug, I'm leaving here just for reference.

MrNeRF commented 10 months ago

I think the differences are mostly due to differences in the pytorch/libtorch interface (libtorch is less powerful and does support way less operations). And I suspect there are some implementation diffs within these two (lol, look at the optimizer state concatenations). You can plug in the original rasterizer and the difference will persist (and vice versa).

Keep in mind that you cannot compare anything that was processed by the backward path as the results are non-deterministic. So you will always see numeric differences in the computations even if you run the same input through the backward pass twice.

kwea123 commented 10 months ago

differences in the pytorch/libtorch

no, I don't use libtorch at all, what I did is I replace the original cuda_rasterizer directory with your cuda_rasterizer dir, and build it using the original repo's setup. Then I run one pass with forward and one pass with backward with the same tensors by calling the C interface (e.g. https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/59f5f77e3ddbac3ed9db93ec2cfe99ed6c5d121d/diff_gaussian_rasterization/__init__.py#L86). There is no training involved, so no optimizer state or any other thing.

results are non-deterministic

yes it is not deterministic, but such a big relative error still suggests there is some difference imo. Like if you run either implementation's backward twice, you get very little difference like ~1e-6