aipixel / GPS-Gaussian

[CVPR 2024 Highlight] The official repo for “GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis”
https://shunyuanzheng.github.io/GPS-Gaussian
MIT License
505 stars 30 forks source link

Failure cases in avatar data #32

Closed Yifehuang97 closed 6 months ago

Yifehuang97 commented 6 months ago

Hi, thanks for your work! When I am using the default setting on avatar, I saw some failure cases like:

Fail2 Fail1

It seems the Gaussian regression network fails to learn how to combine the Gaussian from the left/right views. Also, the stage 2 loss seems didn't converge well. image

I also am not sure whether the stage 1 result is good enough; the final validation EPE is around 1.767. image

Thanks for your time!

ShunyuanZheng commented 6 months ago

I think the depth estimation in Stage 1 is well-trained since an EPE around 1.767 is reasonable. However, the failure case in fig2 is confusing. The obvious mismatch of two partial Gaussian points is witnessed, and I think it is mainly caused by camera parameters. I suggest saving the point clouds of both views either in Stage 1 or Stage 2. Also, what is the EPE metric in Stage 2?

Yifehuang97 commented 6 months ago

Thanks for your reply!

This is the EPE metric of validation set in stage 2: image

For saving the point clouds, do you mean to call the depth2pc to get the 3D position to see if the result of left/right is consistent?

Thank you for your help!

ShunyuanZheng commented 6 months ago

The EPE also seems reasonable : )

To save the point cloud, just save the 'xyz' and 'img' in the valid region as the vertices and colors of the point cloud using trimesh[https://trimesh.org/]. Take the following code as an example.

for view in ['lmain', 'rmain']:
    valid_i = data[view]['pts_valid'][0, :]  # [S*S]
    xyz_i = data[view]['xyz'][0, :, :]  # [S*S, 3]
    rgb_i = data[view]['img'][0, :, :, :].permute(1, 2, 0).view(-1, 3)  # [S*S, 3]
    xyz_i = xyz_i[valid_i].view(-1, 3)
    rgb_i = rgb_i[valid_i].view(-1, 3)
    rgb_i = (rgb_i + 1.0) * 0.5 * 255
    ply_out = trimesh.points.PointCloud(vertices=xyz_i.detach().cpu().numpy(), colors=rgb_i.detach().cpu().numpy())
    ply_out.export(OUTPUT_PATH + '/%s_%s.ply' % (data['name'], view))
Yifehuang97 commented 6 months ago

Thank you so much!

Yifehuang97 commented 6 months ago

I apologize for the inconvenience of reaching out once more. I've resolved a bug in my previous code, which now produces reasonable results. However, it appears that both the L1 loss and SSIM loss are not converging as expected. Furthermore, the quality of the rendered images does not seem to improve with additional training iterations. Could you suggest any potential reasons for this issue?

Results from the first iteration on the training set: image

Results after 100,000 iterations on the training set: image

Here is the training loss: image

Thank you so much, and sorry for the inconvenience.

ShunyuanZheng commented 6 months ago

Sorry for the late reply. I got a similar result when I trained under a half-body setup where the severe self-occlusion caused many holes in novel views. The large-scale Gaussians are predicted to compensate for the missing areas. However, it should not be the reason for the head NVS since there is not any occlusion. I think you can manually set the scale of Gaussians to zero in the Gaussian rasterization and compare the rendered novel view image and the ground truth. I think there still exists a mismatch between them which is potentially caused by the incorrect camera parameters.

Yifehuang97 commented 6 months ago

Thank you!