facebookresearch / co3d

Tooling for the Common Objects In 3D dataset.
Other
921 stars 71 forks source link

Predicted camera poses evaluation #86

Closed ostapagon closed 4 months ago

ostapagon commented 4 months ago

Hello, thank you for the appreciation regarding the dataset. I am utilizing your dataset for a pipeline that concurrently reconstructs 3D scenes and predicts camera poses.

Context: I am developing a pipeline that employs metrical depth and Gaussian splatting to reconstruct a 3D scene and forecast the pose for the subsequent frame. The approach involves unprojecting pixels using metrical depth and intrinsic parameters from your dataset. After reconstructing the image in camera space for Image_0, I optimize rotation (R) and translation (T) matrices to transition from the pose of Image_0 to the pose of Image_1. To ensure proper alignment, I utilized your camera poses and estimated metrical depth for point cloud reconstruction. However, I noticed that the Structure from Motion (SfM) estimated camera poses are not in the metric system. img_unprojection

Questions: If I have estimated camera poses represented as sets of R_est and T_est from Image_0 -> Image_1 -> ... -> Image_n, can I still utilize these parameters to calculate the relative pose error (RPE) metrics for rotation and translation using R_est, T_est, alongside R_dataset and T_dataset? Or is this comparison infeasible because the metrical depth and the provided camera poses are not aligned and suffer from scale ambiguity?

I think, I could use something called Procrustes analysis to find these error, as my estimated camera poses and gt camera poses are in different corrdinate systems, due to scale dff, right? I would be really greatful if you could share some resource or examples, for alignment so evaluation is possible

shapovalov commented 4 months ago

Thanks for your interest. You are right that there is scale ambiguity in the reconstruction process as we don’t have any depth sensors. We normalised each scene in the way that the SfM point cloud has STD=1 and is zero centered. Given that, I think Procrustean analysis is indeed the typical way to resolve this ambiguity. This function might be helpful: https://github.com/facebookresearch/pytorch3d/blob/main/pytorch3d/ops/cameras_alignment.py

ostapagon commented 4 months ago

@shapovalov Thank u very much for the explanation, I`m pretty new to these camera manipulations as it can be really tricky. I will try to use example code u`ve provided for the alignment.