Closed JonnyKong closed 3 years ago
Hi, depth is NOT distance. It's only the z component of a point. So a point at the border of the frame with the same depth as another point at the center will indeed be further.
So why is that ? It's because everything is much simpler this way for computation. When you have depth, and intrinsics, the 3D position of a 2D point (u,v) is only
(x,y,z)^T = depth * (K^-1 * (u,v,1)^T)
and then we get x = depth((u-u0)/f) y = depth((v-v0)/f) z = depth
You can see how these functions are actually linear, which is very good for optimization stability On the other hand, if we have depth = sqrt(x^2 + y ^2 + z^2), every thing is suddenly not linear, and much more complicated:
(x',y',z')^T = K^-1 (u,v,1)^T
(x,y,z) = depth * (x', y', z') / sqrt(x'^2 + y'^2 + z'^2)
and then we get
x = depth * ((u-u0)/f) / sqrt(((u-u0)/f)^2 + ((v-v0)/f)^2 + 1)
y = depth * ((v-v0)/f) / sqrt(((u-u0)/f)^2 + ((v-v0)/f)^2 + 1)
z = depth / sqrt(((u-u0)/f)^2 + ((v-v0)/f)^2 + 1)
You can see many non linear functions are involved, and also how when f is great (i.e. fov is low) both expression are very close.
The key point here is that depth is z, by convention, and not distance. It would probably work when using distance, but that makes everything more complicated.
Hope it was informative to you
Hope
Thank you very much for the detailed explanation.
In
kitti_raw_loader.py
, the depth value is assigned by the following line:https://github.com/ClementPinard/SfmLearner-Pytorch/blob/ae63049a95ec35bfaccce67142d1ae381c484389/data/kitti_raw_loader.py#L294
where the depth value of each pixel is assigned as
velo_pts_im[:, 2]
, which is thez
value in the Euclidean space.Shouldn't the depth value be assigned as
sqrt(velo_pts_im[:, 0]**2 + velo_pts_im[:, 1]**2 + velo_pts_im[:, 2]**2)
instead, which is the distance to the pinhole camera at(0, 0, 0)
? Thank you.