About align cameras using first view

autonomousvision / LaRa

[ECCV 2024] Efficient Large-Baseline Radiance Fields, a feed-forward 2DGS model

https://apchenstu.github.io/LaRa/

MIT License

262 stars 11 forks source link

About align cameras using first view #10

Open 1843744321mark opened 1 month ago

1843744321mark commented 1 month ago

It's a really great job. I am not sure why we need align cameras using first view, and cannot understand these formula. Could you please tell me a more detailed explaination about it? Thank you very much! r = np.linalg.norm(tar_c2ws[0,:3,3]) ref_c2w = np.eye(4, dtype=np.float32).reshape(1,4,4) ref_w2c = np.eye(4, dtype=np.float32).reshape(1,4,4) ref_c2w[:,2,3], ref_w2c[:,2,3] = -r, r transform_mats = ref_c2w @ tar_w2cs[:1] tar_w2cs = tar_w2cs.copy() @ tar_c2ws[:1] @ ref_w2c tar_c2ws = transform_mats @ tar_c2ws.copy()

apchenstu commented 1 month ago

Thanks! I thought it could provide better generalizability, but I didn't make an ablation on this. Now I feel it may provide better results if disabling this alignment, feel free to try ;)

1843744321mark commented 1 week ago

Thanks for your kindly reply! I meet another problem and hope for your reply. Why should we use permute(0,2,1) for w2c and camera Intrinsics? def projection(grid, w2cs, ixts): points = grid.reshape(1,-1,3) @ w2cs[:,:3,:3].permute(0,2,1) + w2cs[:,:3,3][:,None] points = points @ ixts.permute(0,2,1) points_xy = points[...,:2]/points[...,-1:] return points_xy, points[...,-1:]

apchenstu commented 1 week ago

Assuming the points array (excluding the batch dimension) has a shape of [N x 3] and ixts is [3 x 3], the projection function ixts @ points.T can be transposed to points @ ixts.T. The same operation can be applied to the extrinsic transformation, eliminating the need to transpose the points—simply transpose the extrinsic or intrinsic matrix instead.