facebookresearch / localrf

An algorithm for reconstructing the radiance field of a large-scale scene from a single casually captured video.
MIT License
956 stars 62 forks source link

[Question] Analysis on Pose Estimation Results #22

Closed jameskuma closed 12 months ago

jameskuma commented 12 months ago

Hey and thank you for this awesome work!!!

I run the code and want to evaluate the performance of pose estimation. ATE seems like a common tool to realize this. However, the pose estimation results are not good even using the option --with_GT_poses 1

The details are as follow:

I get the estimated results using poses_mtx = local_tensorfs.get_cam2world().detach()

The ground truth poses are loaded with (the same as localrf_dataset.py, line47)

image

Then these two results are used to get translation and rotation error. However, the results are bad... (even with Rotation error=142 degree). It must be something wrong during the above process.

Would you like to take a look at this and point out the mistake I make here. I really appreciate it!!!

ameuleman commented 12 months ago

Hi, could you please try with --with_GT_poses 1 --lr_R_init 0 --lr_t_init 0? This can be run with high speedup factors for quick testing --prog_speedup_factor 4 --refinement_speedup_factor 4.

jameskuma commented 12 months ago

OOOOOOOh!! I really appreciate your timely reply! I would run with the command as follow and check the results further!

python localTensoRF/train.py --datadir ${SCENE_DIR} \
                             --logdir ${LOG_DIR} \
                             --fov ${FOV} \
                             --device cuda:${GPU_ID} \
                             --subsequence 0 200 \
                             --frame_step 1 \
                             --with_GT_poses 1 \
                             --lr_R_init 0 \
                             --lr_t_init 0 \
                             --prog_speedup_factor 4 \
                             --refinement_speedup_factor 4
jameskuma commented 12 months ago

Thank you for your suggestions! But rotation error and translation error are still not well.

Therefore, I print poses_mtx = local_tensorfs.get_cam2world().detach() and find the first three poses are image However, the first three poses from ground truth poses are image

ameuleman commented 12 months ago

I would not expect rotations to change significantly (translations are scaled though). I will try to investigate at the end of next week.

ameuleman commented 12 months ago

What scene are you using for this?

jameskuma commented 12 months ago

Thank you! I test pose error in forest1 scene.

jameskuma commented 12 months ago

Hi! Maybe I find this weird error is caused by wrong ground truth camera poses.

I plot the estimated trajectory

image

And here is the ground truth trajectory

image

Maybe there is something wrong in loading ground truth poses? I hope this could help you debug and I would like to keep this open for updating the solution.

ameuleman commented 12 months ago

Thank you! I test pose error in forest1 scene.

Ah. I just realized there was confusion as I was expecting you to be using a different dataset. We do not provide ground truth pose: transforms.json files in our dataset come from COLMAP and we do not provide ground truth poses. The with_GT_poses name was set in the early stage. I understand that it is misleading and I renamed it with_preprocessed_poses.

ameuleman commented 12 months ago

However, we should get the same poses (up to scaling) as transforms.json with the arguments --with_preprocessed_poses 1 --lr_R_init 0 --lr_t_init 0. I will be investigating this next week.

jameskuma commented 12 months ago

Yes! I agree with you and I just want to check the difference between estimated poses of localrf and estimated poses of colmap.

After do the following change in local_tensorf.py, line 147:

def append_frame(self):
    if len(self.r_c2w) == 0:
        self.r_c2w.append(torch.eye(3, 2, device=self.device))
        self.t_c2w.append(torch.zeros(3, device=self.device))

        self.pose_linked_rf.append(0)            
    else:
        self.r_c2w.append(mtx_to_sixD(sixD_to_mtx(self.r_c2w[-1].clone().detach()[None]))[0])
        self.t_c2w.append(self.t_c2w[-1].clone().detach())

        self.blending_weights = torch.nn.Parameter(
            torch.cat([self.blending_weights, self.blending_weights[-1:, :]], dim=0),
            requires_grad=False,
        )

        rf_ind = int(torch.nonzero(self.blending_weights[-1, :])[0])
        self.pose_linked_rf.append(rf_ind)

    self.exposure.append(torch.eye(3, 3, device=self.device))

    if self.camera_prior is not None:
        idx = len(self.r_c2w) - 1
        rel_pose = self.camera_prior["rel_poses"][idx]
        # ! before
        # last_r_c2w = sixD_to_mtx(self.r_c2w[-1].clone().detach()[None])[0]
        # self.r_c2w[-1] = last_r_c2w @ rel_pose[:3, :3]
        # self.t_c2w[-1].data += last_r_c2w @ rel_pose[:3, 3]
        # ! after
        self.r_c2w[-1].data = mtx_to_sixD(rel_pose[:3, :3])
        self.t_c2w[-1].data = rel_pose[:3, 3]

    self.r_optimizers.append(torch.optim.Adam([self.r_c2w[-1]], betas=(0.9, 0.99), lr=self.lr_R_init)) 
    self.t_optimizers.append(torch.optim.Adam([self.t_c2w[-1]], betas=(0.9, 0.99), lr=self.lr_t_init)) 
    self.exp_optimizers.append(torch.optim.Adam([self.exposure[-1]], betas=(0.9, 0.99), lr=self.lr_exposure_init)) 

I get the same poses with with_GT_poses 1 --lr_R_init 0 --lr_t_init 0.

However, the pose error is still quite large (trans error=2cm, rotation error=50 degrees) in forest1 scene with the following setting.

python localTensoRF/train.py --datadir ${SCENE_DIR} \
                             --logdir ${LOG_DIR} \
                             --fov ${FOV} \
                             --device cuda:${GPU_ID} \
                             --subsequence 0 100 \
                             --frame_step 1 \
                             --with_GT_poses 0 \
                             --prog_speedup_factor 1 \
                             --refinement_speedup_factor 1
ameuleman commented 12 months ago

After do the following change in local_tensorf.py, line 147: [...] I get the same poses with with_GT_poses 1 --lr_R_init 0 --lr_t_init 0.

That is odd. I will take a closer look later. Are rendered images sensible after this change?

However, the pose error is still quite large (trans error=2cm, rotation error=50 degrees) in forest1 scene with the following setting.

Not setting lr_R_init and lr_t_init to 0 will allow the poses to change during optimization.