Minor error in the code

Wuziyi616 commented 2 years ago

The demo command here

python3 -m run --ginc configs/co3d.gin

should be changed to

python3 -m run --ginc configs/co3d_v1.gin

since you have both CO3D V1 and V2 in the codebase now.

Another issue is here, where you have two identical lines of lambda_tv_background_color. Do you want to set lambda_tv_background_sigma here (since the default value is 1e-2) or is it just a typo?

Wuziyi616 commented 2 years ago

Btw I have a few questions:

Is the teaser gif rendered by the NeRF models you trained? The images look very sharp. The images on the webpage also look good. However, when I go to the hugging face data repo, the previewed images seem to have many floaters and blurry artifacts (e.g. this laptop)
Is there a way to select scenes where your pre-trained Plenoxels can generate sharp results? E.g. do you record the PSNR of all scenes you trained? Or if the blurry images are because their rendering views are too different from the camera poses in the training sequence?
I want to build a video dataset with high FPS. The original CO3D only provides sub-sampled images (and their original videos are not high FPS either). This is why I find PeRFception useful -- I can densely sample a trajectory of camera poses, and then rendering along this trajectory will give me high FPS video. But to get high-quality videos, I want each rendered frame to be sharp and look natural. Do you have any tips on how to better leverage your pre-trained Plenoxel models? E.g. sampling the camera trajectory along the training sequence? (Also I feel that Plenoxels trained on CO3D-V2 should have fewer artifacts than V1. Is that correct?)

Sorry for texting so many questions. I really appreciate your work, and I believe it will be very helpful to the entire community. Thanks in advance!

Wuziyi616 commented 1 year ago

@jeongyw12382 any updates here?

jeongyw12382 commented 1 year ago

Hi. Sorry for being late. All our team members were busy for the CVPR submission. Here are the responses for you questions.

Thanks for the suggestion. We will fix this issue when updating the second version. We are currently generating the dataset.
Thanks for point the typo. We have just adjusted the lambda_tv_color.
It really depends on the scenes' condition. All the teaser images are picked from our generated dataset.
One of the thing we've observed while generating PeRFception is that PSNR is not an almighty metric. We first picked the top 200 rendered images with PSNR, SSIM, and LPIPS. Then. we picked the overlapping scenes.
This should be related to our future extension. First, as you've mentioned, Plenoxel is a great tool for generating high quality videos if two condition holds: the quality of image should be sufficiently great, and estimated camera poses should be accurate. One tip for the former condition is to extend the training step. Because we renderd more than 10K scenes, we could not train sufficiently many iterations. As we've observed in the many scenes, the training was not perfectly done, i.e., validation curve keeps increasing. For the latter condition, you could try for better SfM tools with a stronger recent SOTA matchers. This was done for given camera poses from the official CO3D. But, we are sure that if we utilize stronger camera calibration tools, such as SuperGlue, to acquire camera poses, the result should be much better.

Thanks :)

jeongyw12382 commented 1 year ago

Feel free to reopen issue if you have any questions of helps for this issue. We'll immediately reflect your comment on the upcoming update.

Wuziyi616 commented 1 year ago

@jeongyw12382 Thanks a lot for your reply! That answers most of my questions. Just a minor one: you are using depth maps to initialize ScanNet NeRF training. Have you tried similar things to CO3D, as it also provides sparse point clouds (though it's reconstructed by COLMAP)? Also, have you tried e.g. depth-supervision loss to improve the NeRF performance on CO3D?

Also, I tried training on CO3D-V2, though the numerical results (e.g. PSNR) clearly improve, the floaters don't seem to improve at all. That's very weird... I'm attaching the 2 rendered videos on the same data from V1 and V2, I really cannot tell the difference

https://user-images.githubusercontent.com/37072215/202307451-1f9c9945-0294-41cd-bb51-bf9fd2110db1.mp4

https://user-images.githubusercontent.com/37072215/202307475-c0dd5cdb-541a-4170-8001-4b7ad29254ed.mp4

POSTECH-CVLab / PeRFception

Minor error in the code #16