facebookresearch / localrf

An algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video.
MIT License
956 stars 62 forks source link

The reason to scale camera poses #46

Closed Tausc closed 7 months ago

Tausc commented 7 months ago

Thank you very much for this great work! It has been very helpful to us! However, we have encountered some issues.

In the LocalRFDataset.init() function of localrf_dataset.py, you applied the following scaling operation to self.rel_poses:

            scale = 2e-2 / np.median(np.linalg.norm(self.rel_poses[:, :3, 3], axis=-1))
            self.rel_poses[:, :3, 3] *= scale
            self.rel_poses = self.rel_poses[::frame_step]

I would like to know how you selected the scale value. Why is it 2e-2?

We tested it on the "0001" sequence of the KITTI dataset, and when the scaling is enabled, the reconstruction results are very good (the following image is rendered on the half way of training. It is normal overally): 000020 000020

However, when we disable the scaling operation, the reconstruction results become very strange, appearing as if they are composed of multiple slice planes parallel to the camera plane: 000020 000020

Both experiments were conducted with pose optimization and depth supervision enabled.

I'm curious why the scaling operation has such a significant impact. How can I achieve normal rendering results without the scaling operation? Thank you!

ameuleman commented 7 months ago

Scaling the poses is common for radiance field optimization. Here we use 0.02 as it typically fits 50 frames within the high-quality [-1, 1] bounds of a radiance field without allocating a new one. This roughly matches what we obtain when optimizing poses from scratch. It wouldn't be easy to remove scaling, we would need to look into space contraction bounds and learning rates. By the way, is the flow supervision enabled?

Tausc commented 7 months ago

no, the flow supervision is not enabled in both experiments. Actually we conduct another experiment without scaling in waymo too, the result is quite good. I guess this is because the whole scene in kitti is too large? And we are going to maintain the scaling now, thank you for replying!

ameuleman commented 7 months ago

Flow supervision, if available, helps more than depth supervision in my experience. I would guess that Waymo's poses are already scaled so that np.median(np.linalg.norm(self.rel_poses[:, :3, 3], axis=-1)) ~= 0.02? Are you using Waymo open dataset or the blocknerf dataset?

Tausc commented 7 months ago

The scale of the waymo's poses we use is about 0.035, so i don't think they are already scaled? We are using the waymo open dataset preprocessed by the scripts provided by streetsurf: https://github.com/PJLab-ADG/neuralsim/blob/main/docs/data/autonomous_driving.md Here's the result of the experiment which conduct on Waymo without scaling: 00000010 00000010 much better than the result of kitti!

ameuleman commented 7 months ago

Interesting, thanks for sharing!