SJoJoK / 3DGStream

[CVPR 2024 Highlight] Official repository for the paper "3DGStream: On-the-fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos".
https://sjojok.github.io/3dgstream
MIT License
302 stars 18 forks source link

Questions about the blurry render results #7

Closed HyoKong closed 4 months ago

HyoKong commented 4 months ago

Hi, thank you so much for the wonderful work!

I've followed your guidance to train flame_steak scene. When warming up NTC, the final loss value is around 2e-4, which means that the initial NTC is well trained. Then I train NTC for each frame, following your guidance in readme file. Though the PSNR of stage 1 and 2 are high, for instance, 28.4 and 29.7 of stage 1 and 2 at frame 5, the rendered images are somehow blur. The rendered results are similar to https://github.com/SJoJoK/3DGStream/issues/4#issue-2214597013.

Could you pls help to debug is there something wrong with my implementation? Thank you so much for the help!

SJoJoK commented 4 months ago

Hello, if you use the data we provided in the code (init_3dgs, cameras, pre-trained NTC), PSNR should be around 34, and this is validated by the researchers in the pre-release. If you train from scratch, please make sure that the set AABB covers the whole scene (room) - the warm-up process can only ensure that when the point cloud in AABB is used as the input, it can produce a translation close to 0 and a rotation close to the unit quaternion. It's OK if the AABB is not very accurate, but please make sure that the scene (except the outliers and the landscape far away) is enclosed by it. In other words, it can be large, but not small.

SJoJoK commented 4 months ago

In addition, the first stage PSNR 28.4 and the second stage PSNR is 29.7, which is actually a very unusual result, because in our experiment, the second stage will not bring such a great improvement. This shows that the NTC is not well trained. There are many factors - the coordinates of AABB corner points are wrong, the number of iterations is insufficient, or the configuration of NTC is wrong. Considering that the latter two have been written in the config, only the AABB angle is related to the scene, so I think it is due to the wrong AABB. Please note that although it is the same multi-view video data set, the camera estimated by colmap may not be the same, and the AABB of the init_3dg is not the same, so the coordinates we provided in the script cannot be used, you need to modify it if you train a scene from scratch.

HyoKong commented 4 months ago

Thank you so much for the detailed explaination! I checked the log in details and i notice that V100 GPU doesn't support FullyFusedMLP. The warning log says tiny-cuda-nn warning: FullyFusedMLP is not supported for the selected architecture 70. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+. Does CutlassMLP cause the degradation of the performance?

I enlarged AABB to test the performance just now. The performance is even a little bit worse. So im not sure if CutlassMLP is not working.

Thank you so much for the help!

SJoJoK commented 4 months ago

In theory, different implementations of MLP should only vary in terms of speed and not result in any differences in outcomes. However, we only tested the code on the RTX 3090, RTX 4090, and A6000 GPUs. Therefore, we cannot guarantee that it will operate without errors on the V100 due to potential discrepancies in GPU hardware.

I would like to offer three progressively complex suggestions, which you can try one by one:

  1. Increase the number of training iterations for s1 to make sure that the NTC is converged (modify test_iterations to see how psnr goes, commonly, the psnr of stage 1 is close to the previous timestep if converged).
  2. Visualize your AABB and the init_3dgs to ensure they align with your expectations (this can be done with open3d).
  3. Strictly follow the guidance; use init_3dgs, cameras, and NTC(flame_steak_ntc_params_F_4.pth) from the test/flame_steak_suite for training.

The third suggestion is particularly crucial. If followed correctly, you should achieve an average PSNR of 34 for the FVV, a result that has been re-produced by many users. If not, I suspect the issue may lie with the GPU.

I apologize for any inconvenience this may have caused. If you are unable to achieve satisfactory results on the RTX 3090, RTX 4090, or Nvidia A6000 as well, please do not hesitate to contact us.

HyoKong commented 4 months ago

Thank you for the explanation! I will check in details. Thank you!