Closed griffbr closed 3 years ago
In my humble opinion, eventually, the instantaneous velocity will be multiply with the time different of the frame to get the relative displacement between the frame.
The reason they do this
From velocity_loss.py line 34 (gt_trans = [pose[:, :3, -1].norm(dim=-1) for pose in gt_pose_context]), it appears that the velocity loss uses the difference in Euclidean distance between the predicted and ground truth pose between images. Is this correct?
is because
Hi, thank you for the interest in our repository! You don't need to provide ground-truth pose, only instantaneous velocity. We provide the full 4x4 matrix because that is available, but we don't use it. Our
pred_trans
andgt_trans
only take the last column, which contains translation, and that's what is being used to calculate the loss.Originally posted by @VitorGuizilini-TRI in https://github.com/TRI-ML/packnet-sfm/issues/91#issuecomment-731652275
Hello,
Thank you for sharing this amazing work. I have a quick question for clarification on velocity supervision.
From velocity_loss.py line 34 (gt_trans = [pose[:, :3, -1].norm(dim=-1) for pose in gt_pose_context]), it appears that the velocity loss uses the difference in Euclidean distance between the predicted and ground truth pose between images. Is this correct?
On the other hand, Equation (6) from "3D Packing for Self-Supervised Monocular Depth Estimation" appears to suggest that this supervision comes from "the measured instantaneous velocity scalar v multiplied by the time difference between target and source frames..." Is this calculation performed somewhere else? Are these equivalent? Did you try velocity and it didn't work as well as the GT pose? Any clarification is very much appreciated.
Thanks again! Brent