Totoro97 / f2-nerf

Fast neural radiance field training with free camera trajectories
https://totoro97.github.io/projects/f2-nerf/
Apache License 2.0
912 stars 69 forks source link

bad result in kitti360 #65

Closed h8c2 closed 1 year ago

h8c2 commented 1 year ago

Thanks for your great work. I was working on training f2-nerf in kitti360. However, the result was really bad. image (the train view of seq 2013_05_28_drive_0018_sync) I first converted the cam0_to_world.txt from kitti360 to cam_meta.npy and use the wanjinyou_big.yaml to train my model. Although I select a few part of the dataset and train it again, the result was not successful. image with test image psnr around 10 I was wondering if I made some mistakes. It is appreciate if you could give me some advice.

Totoro97 commented 1 year ago

Hi, thanks for your interest. I do not have this sequence of data. Would you mind sharing this sequence of images for the debugging? Thanks!

h8c2 commented 1 year ago

thanks for your kind reply. I have uploaded the data to google drive, the processed dictionary stores the train format and other dicts are the raw data from 2013_05_28_drive_0018_sync kitti360. https://drive.google.com/drive/folders/1dOxm-HF07kcvOoUf2eRzxK2SWdlkbfWD?usp=sharing

Totoro97 commented 1 year ago

Hi, thanks for your prompt reply and for providing the data. I have checked the data, and I found the distribution of images may be too sparse, i.e., the movements between two neighboring images are too large, which is challenging to infer the 3D scene for NeRF training. By the way, this data covers a really large scene that I think far exceeds the encoding ability of the current hash table setting of wanjinyou_big (table size 2^20). I am not sure if increasing the table size can help better scene encoding, which would also brings a much higher requirement of GPU memory and longer training time.

For your reference, F2-NeRF can be successfully trained by wanjinyou.yaml (with factor=1) on this example data from the Kitti dataset: https://www.dropbox.com/s/v9avgr9akvjcprx/kitti_sample.zip?dl=0

Thank you for your interest and I believe this sparse image setting could be a very interesting future work😃

h8c2 commented 1 year ago

Thank you very much for your kind and useful reply. I have checked the data again and found that the previous sequence is one-tenth the frame rate of a normal sequence. I turn to another sequence 2013_05_28_drive_0010_sync and train with about 100 frames. The results are great. image image Thanks again for providing such an effective way to train NeRF in the real, unbound scene.

qhdqhd commented 1 year ago

Kitti360 are equipped a station wagon with one 180° fisheye camera to each side and a 90° perspective stereo camera (baseline 60 cm) to the front. I would like to ask which camera did you use? Or are all 4 cameras used?

qhdqhd commented 1 year ago

@h8c2 Thank you!

h8c2 commented 1 year ago

Hi, I have tested the first data(0018) with both mono and stereo perspective cameras, since the view is very sparse. Both settings failed. The second data is monocular. I suppose the problem may be caused by the construction of the octree since the octree requires the certain number of cameras to split and build warp function.

ckLibra commented 1 year ago

Thank you very much for your kind and useful reply. I have checked the data again and found that the previous sequence is one-tenth the frame rate of a normal sequence. I turn to another sequence 2013_05_28_drive_0010_sync and train with about 100 frames. The results are great. image image Thanks again for providing such an effective way to train NeRF in the real, unbound scene.

@h8c2 Hi, May I ask you the train/test split for kitti360 and your final PSNR? Do you follow the default split in code for 100 frames? Currently I use a 80 frames trajectory on kitti and its average PSNR for test 8 frames are 22.5. I also wonder what is its performance like when we apply the official kitti360 novel view synthesis benchmark which has fewer training images.

Totoro97 commented 1 year ago

@ckLibra Hi, for testing PSNR, you may set renderer.use_app_emb as false and see if the PSNR can improve. (If you use the wanjinyou or wanjiyou_big config)

ckLibra commented 1 year ago

@Totoro97 Thanks for your kind reply! It does improve the PSNR. My test sequence improves from 22.5 to 23. For your kitti squence mentioned above, the PSNR is improved from 18.6 to 22.55. However, the depth image for your kitti sequence seems wrong and noisy in the road and distant regions. I am not sure whether I have run your kitti sequence in a right way? Do you have the same phenomenon?

Totoro97 commented 1 year ago

@ckLibra Hi, the depth image can be noisy on some textureless regions, and accurate depth estimation is outside the scope of f2-nerf that aims to solve.

ckLibra commented 1 year ago

@Totoro97 Thanks for your reply! It really helps.

chuong98 commented 12 months ago

For your reference, F2-NeRF can be successfully trained by wanjinyou.yaml (with factor=1) on this example data from the Kitti dataset: https://www.dropbox.com/s/v9avgr9akvjcprx/kitti_sample.zip?dl=0

Hi @Totoro97, thanks for releasing the illustrated example for Kitti. I can reproduce f2-nerf on this dataset. However, when I tried to generate the dataset.db using both: scripts/local_hloc_and_resize.sh and scripts/local_colmap_and_resize.sh, I can't generate good Camera Pose for estimation. Do we need to run specific version of ColMap or Hloc, or do I need to change any parameters? Thanks so much