Closed Tianci-Wen closed 3 months ago
Hi Tianci, thanks for pointing out this issue. The quantitative results of other baselines and Photo-SLAM reported in the paper are the average metrics of PSNR, SSIM, and LPIPS for all frames, as what you found from this evaluation code. Since different systems have different strategies for creating keyframes, the number of keyframes varies. It may not be so fair to evaluate the rendering quality of keyframes only, especially considering the overfitting issue.
Hi Tianci, thanks for pointing out this issue. The quantitative results of other baselines and Photo-SLAM reported in the paper are the average metrics of PSNR, SSIM, and LPIPS for all frames, as what you found from this evaluation code. Since different systems have different strategies for creating keyframes, the number of keyframes varies. It may not be so fair to evaluate the rendering quality of keyframes only, especially considering the overfitting issue.
Understood, thank you for your explanation! I have always been uncertain about this because the metrics in the 3DGS papers are all results from the test set, which is the normal practice. However, by calculating metrics like PSNR in this way, almost all methods combining 3DGS and SLAM cannot achieve the metrics reported in the papers. But no paper explicitly explains how these results are obtained, I have always been worried about this issue. I think your explanation makes sense, thank you again!
After reviewing your evaluation code, I noticed that when calculating PSNR, SSIM, and LPIPS, you render images from all viewpoints estimated by SLAM and then compare them to the ground truth images to calculate the average values. However, the issue is that during training, the model was only supervised by the ground truth images of the keyframes. The other viewpoints were not supervised and should be considered as the test set to calculate PSNR, SSIM, and LPIPS separately. The keyframes should be used as the training set to calculate the metrics (as done in the original 3DGS's
metrics.py
). I read the evaluation code for MONOGS, and it only evaluates PSNR, SSIM, and LPIPS for the keyframes.I would like to know whether the rendering metric results in your paper were evaluated using only the keyframes or included all frames as in your evaluation code. It would be helpful if you could provide references for your approach. Thank you very much!