alejocb / dpptam

DPPTAM: Dense Piecewise Planar Tracking and Mapping from a Monocular Sequence
GNU General Public License v3.0
219 stars 82 forks source link

Confusion in the use of semidense_tracker->R&t #11

Closed sunghoon031 closed 8 years ago

sunghoon031 commented 8 years ago

Please correct me if I'm wrong. In the function "join_maps", you used the function "transformed_points" to transform pointClouds3D (wrt world frame) to homogeneous camera coordinate "pointsClouds3Dmap_cam" by computing points3D_cam = R*points3D + t_r and projecting it. This implies R and t is from world to camera transformation.

However, in the function "optimize_camera", R_rel & t_rel are computed as if R&t are from camera to world and R_rel and t_rel are from current to last keyframe.

Would you please tell me what I missed?

sunghoon031 commented 8 years ago

Okay I finally understand what was going on... The term R_rel and t_rel was a bit misleading for me. First of all, semidense_tracker->R & t are always from world to camera

Denoting R1, t1 for Camera_pose1 and etc:

Camera_pose 1----[optimize_camera]---->Camera_pose 2 ----[optimize_camera]---->Camera_pose_3 ....................................................................................... R_rel = R1.t()R2 ...................... ....................................................................................... t_rel = R1.t()(t2-t1) .....................

The thing is, R_rel and t_rel are not precisely the relative transformation. But things make sense in the "motion_model" function: R_m (R1 in the code, but changed to avoid conflict) = RR_rel = R2R1.t()R2 t_m = Rt_rel+t = R2R1.t()(t2-t1)+t2 Applying R_m and t_m to a 3D point in world frame means: X_m = T2inv(T1)T2* X_w

So what's actually happening is that X_m is the result of the transformation T2inv(T1) on T2 X_w where T2inv(T1) represents the transformation from Camera_pose1 to 2 ==> The actual T_rel !! T2X_w represents camera_pose 2.

This is why I think the notation R_rel and t_rel is confusing because then it looks as if R and t denote the transformation from camera to world frame (which I still found strange that it says something similar in the paper).