chenhsuanlin / bundle-adjusting-NeRF

BARF: Bundle-Adjusting Neural Radiance Fields 🤮 (ICCV 2021 oral)
MIT License
793 stars 114 forks source link

Weird traning behavior of barf #46

Closed AIBluefisher closed 2 years ago

AIBluefisher commented 2 years ago

Hi, @chenhsuanlin

Thanks for your great work. I'm trying barf on other scenes. However, the training behavior seems weird. As you can see from the image below, the rotation error is decreasing, but the translation error keeps increasing. image

When I took a look at the synthesized validation image, it seems the result was biased by several pixels from the original image, and also the scale is not consistent with the original image. image

For my experimental setting, I used COLMAP to compute the ground truth camera poses and intrinsics. The initial camera poses for barf are not identities instead of perturbing by a small pose with noise to be 0.15. I wonder if there are any parameters we need to fine tune?

Part of the scene looks like this: P1000686

And the reconstructed scene: image

chenhsuanlin commented 2 years ago

Hi @AIBluefisher, could you first check if you cloned the latest version (see more #23)? The validation target image should now be roughly aligned with the ground truth.
If you did, then it's probably a scene-specific issue, and there are various factors that it may not work well for such a scene. Since you already have the COLMAP poses (and assuming the reconstruction is reliable), I would suggest running just NeRF (with poses) first, or perhaps try adding smaller synthetic pose perturbations.

AIBluefisher commented 2 years ago

Yes, I cloned the latest version. Seems the problem comes from my data loading function. For the result that is shown above, I did not flip the camera poses since colmap's local camera coordinate system is already [right, down, forward], and I recentered the camera poses by this piece of code:

    def center_camera_poses(self, config, poses):
        # For COLMAP's local camera system, please refer to:
        #  https://colmap.github.io/format.html#images-txt
        # compute average pose
        center = poses[..., 3].mean(dim=0)
        v1 = torch_F.normalize(poses[..., 1].mean(dim=0), dim=0)
        v2 = torch_F.normalize(poses[..., 2].mean(dim=0), dim=0)
        v0 = v1.cross(v2)
        pose_avg = torch.stack([v0, v1, v2, center], dim=-1)[None] # [1,3,4]
        # apply inverse of averaged pose
        poses = camera.pose.compose([poses, camera.pose.invert(pose_avg)])
        return poses

I doubt there might be something wrong with the recentering. Thus I experimented without recentering the camera poses and now the results become weirder.

For BARF: image image

For NeRF: image image

We can see that for BARF without recentering camera poses, both the rotation and translation errors keep increasing and the scale is consistent with the original image now. For NeRF without recentering camera poses, it's worse than BARF.

Thus I believe there must be something wrong with my data loading code. Then I tested the code on the llff's trex scene by directly reading the camera poses from COLMAP's binary files. image image

The validation result is blurry and the depth is totally wrong compared to the right result: image

chenhsuanlin commented 2 years ago

As long as the final returned pose in the dataloader is in the [right, down, forward] convention, the format should be good. The center_camera_poses() function was directly borrowed from the original NeRF repo, and it seems to be specific to the forward-facing scenes, but I am not sure if such "recentering" operation is reasonably applicable to any image sequence. You may also want to follow the NeRF repo to preprocess the scenes (e.g. normalize the scale as COLMAP point clouds may come in arbitrary scales). The discussions in bmild/nerf#34 might also be helpful.

chenhsuanlin commented 2 years ago

Closing due to inactivity, please feel free to reopen if necessary.