aipixel / GPS-Gaussian

[CVPR 2024 Highlight] The official repo for “GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis”
https://shunyuanzheng.github.io/GPS-Gaussian
MIT License
509 stars 30 forks source link

Some questions about the paper #58

Closed zhoutianyang2002 closed 2 months ago

zhoutianyang2002 commented 2 months ago

Hi!

Thank you for your excellent work!

I am a newbie of 3D Vision. May I ask some questions about the paper?

  1. I notice that we do not use the perceptual loss term(e.g. vgg perceptual loss) in the loss function unlike other 3DGS avatar papers. We only use L1 term and SSIM loss term. Is that because it is empirically effective or for other reasons?
  2. In this paper, we unproject all pixels of two views into 3D space to form 3D Gaussians. Will it result in the existence of many Gaussian positions in 3D space are very close(because they are corresponding points in 2D images), leading to duplication and reduced efficiency?
  3. In formula(6), it maybe does not like a matrix multiplication form. Maybe the indices are wrong? In other words, maybe the correct form is $$C{i j k}=\sum{h}\left(\mathbf{f}{l}^{S}\right){i j h} \cdot\left(\mathbf{f}{r}^{S}\right){i h k}$$, or $$C{i j k}=\sum{h}\left(\mathbf{f}{l}^{S}\right){i h k} \cdot\left(\mathbf{f}{r}^{S}\right){h j k}$$ , not $$C{i j k}=\sum{h}\left(\mathbf{f}{l}^{S}\right){i j h} \cdot\left(\mathbf{f}{r}^{S}\right){i k h}$$ in paper?

Sorry to bother you. Thank you very much!

ShunyuanZheng commented 2 months ago

Hi, thanks for your interest!

  1. We have tried to use LPIPS loss in the training of GPS-Gaussian but witnessed no significant improvement. Considering the additional memory usage, we do not use it in our pipeline. The loss term of L1+SSIM including the weights follows the setup in 3DGS.
  2. Yes, the Gaussians are very close and small in size compared to the original 3DGS. However, the number of Gaussian points does not significantly degrade the efficiency. As reported in our supplementary material, the rendering of around 300 thousand Gaussians takes around 0.8ms. The compression of GPS-Gaussian as discussed in https://github.com/aipixel/GPS-Gaussian/issues/54#issuecomment-2244117642 worth an in-deep research.
  3. Eq6 borrows from RAFT-Stereo.
zhoutianyang2002 commented 2 months ago

2. Yes, the Gaussians are very close and small in size compared to the original 3DGS. However, the number of Gaussian points does not significantly degrade the efficiency. As reported in our supplementary material, the rendering of around 300 thousand Gaussians takes around 0.8ms. The compression of GPS-Gaussian as discussed in about per-pixel gaussian allocation #54 (comment) worth an in-deep research.

Thank you for your reply! Best wishes!