alexklwong / calibrated-backprojection-network

PyTorch Implementation of Unsupervised Depth Completion with Calibrated Backprojection Layers (ORAL, ICCV 2021)
Other
117 stars 24 forks source link

Question on coordinate frames for pose data #11

Closed rakshith95 closed 2 years ago

rakshith95 commented 2 years ago

Hello, Photo_loss

In the above, the relative pose g(tau)(t) belonging to SE(3), refers to the transformation from the world frame to the camera frame right? That is, the pose is wrt the camera frame.

alexklwong commented 2 years ago

Yes, it is in camera frame. In this particular case, the transformation is the relative pose (as opposed to absolute pose world to camera) between the two cameras.

rakshith95 commented 2 years ago

Oh yeah, of course. Thank you!

rakshith95 commented 2 years ago

@alexklwong is the absolute pose stored in the VOID dataset, the pose of the world frame in the camera frame, or the pose of the camera in the world frame?

i.e. in the first case, Pose Xw = Xcamera if you take homogeneous points , and pose as an affine transform and in the second, Pose Xcamera = Xw.

I apologize for the repetitive nature of the question, but the naming convention is always confusing to me, especially when it says 'absolute'

alexklwong commented 2 years ago

So absolute pose in our case refers to the transformation from camera to world frame coordinates: i.e. following the notation in the above screenshot g{\tau t} refers to the transformation from t to \tau in the camera frame, but in this context it would be written as g{world t} where g takes us from the camera frame at time t to the world frame.

rakshith95 commented 2 years ago

Thank you for the confirmation. I assumed the above, and trained the network with the absolute poses in the void dataset, and the valuation results were actually worse than with posenet. results.txt

Attaching the results.txt file from the training if it is of any interest to you. Would you like to me submit a PR with the changes I made to enable an option to have poses from odometry rather than poseNet, or do you already have some version of that?

alexklwong commented 2 years ago

Looks the opposite right? Using absolute pose is better than using PoseNet (https://github.com/alexklwong/calibrated-backprojection-network/issues/12#issuecomment-1077913649):

PoseNet on VOID is worse by about 5mm (also around 10%) than using pose from VIO. These are the results from your results.txt using absolute pose

MAE       RMSE      iMAE      iRMSE
35.596    89.272    20.327    46.497

whereas I got the following using PoseNet

MAE      RMSE      iMAE      iRMSE
39.80    95.86     21.16     49.72

We do have internal code to test it out on VIO, but did not release that version since it is coupled to a few internal tools. So, yes, please do make a PR for this so others may use it.

Thanks, Alex

rakshith95 commented 2 years ago

PoseNet on VOID is worse by about 5mm (also around 10%) than using pose from VIO. These are the results from your results.txt using absolute pose

So, I ran the validation script with 1.The pre-trained model (with posenet), and 2. The best result model from training with absolute pose data. I've attached both the results files, in which the pre-trained model has the following:

Evaluation results:

     MAE      RMSE      iMAE     iRMSE
  31.294    79.999    16.512    39.643
     +/-       +/-       +/-       +/-
  25.407    68.044    23.693    63.707

With the best model with absolute pose data: Evaluation results:

     MAE      RMSE      iMAE     iRMSE
  41.028    93.816    22.063    46.550
     +/-       +/-       +/-       +/-
  29.446    66.708    27.490    63.679

This is what I meant when I said

and the valuation results were actually worse than with posenet

So, yes, please do make a PR for this so others may use it.

Alright, I'm a bit busy this week, but I'll clean up my code a bit and submit a PR sometime next week.

results_absolutePose.txt results_pretrained.txt

alexklwong commented 2 years ago

Your results for the pretrained model, was that one that was released on the repo? Or did you re-train your own?

You got these numbers (incredibly good, on par with top supervised method https://github.com/alexklwong/awesome-state-of-depth-completion)

     MAE      RMSE      iMAE     iRMSE
  31.294    79.999    16.512    39.643
     +/-       +/-       +/-       +/-
  25.407    68.044    23.693    63.707

but I recall that the pretrained model released should give

MAE      RMSE      iMAE      iRMSE
39.80    95.86     21.16     49.72

so if it is the same pretrained weights, then that suggests that you may not have evaluated on the same data or something was off in the evaluation script. Also, does the number you got when running https://github.com/alexklwong/calibrated-backprojection-network/blob/master/bash/void/run_kbnet_void1500.sh match the number shown during validation?

rakshith95 commented 2 years ago

Your results for the pretrained model, was that one that was released on the repo? Or did you re-train your own?

I downloaded the pre-trained model weights, and ran the run_kbnet_void1500.sh script using it.

Also, does the number you got when running https://github.com/alexklwong/calibrated-backprojection-network/blob/master/bash/void/run_kbnet_void1500.sh match the number shown during validation?

For the network trained on absolute poses, it seems the same (or very similar). Isn't the validation shown during the training, and the evaluation on run_kbnet_void1500.sh run on the same set of data?

The split for me is 35917 training images(data), and 534 testing images(data)

alexklwong commented 2 years ago

Ah looks like you are missing some parts of the dataset. This might be because gdown intermittently fails.

For the training set on VOID1500: 44888 samples For the testing set on VOID1500: 800 samples

You may want to download it manually from the links in https://github.com/alexklwong/calibrated-backprojection-network#setting-up-your-datasets