TRI-ML / packnet-sfm

TRI-ML Monocular Depth Estimation Repository
https://tri-ml.github.io/packnet-sfm/
MIT License
1.24k stars 243 forks source link

How to interpret the reconstructed ray? #105

Open zshn25 opened 3 years ago

zshn25 commented 3 years ago

https://github.com/TRI-ML/packnet-sfm/blob/2698f1fb27785275ef847f3dbbd550cf8fff1799/packnet_sfm/geometry/camera.py#L132-L138

How to interpret the output of the reconstruct function which lifts the depthmap onto 3D using inverse intrinsic matrix? I see that it outputs a ray of size [Bx3xwxh]. I am thinking that this is X,Y,Z co-ordinates and I see that Z is same as the depthmap as it is not affected by the matrix muntiplication with K^{-1}. But, why does camera intrinsic have an affect only on the X,Y and not on Z? I find it difficult to interpret this output. It would be great if anyone gave some insight. Thanks.

VitorGuizilini-TRI commented 3 years ago

You are right, the output is Bx3xHxW containing 3D coordinates for each pixel. The depth map scales each vector such that the Z coordinate is equal to the depth value, and in doing so it also scales the X and Y coordinates (essentially it selects the point in the line segment that corresponds to that depth value).

zshn25 commented 3 years ago

But since the grid is being scaled, are the X,Y coordinates of the output (after depth scaling) are the mapping functions of how the image grid changes as a result of camera intrinsics? If yes, I could calculate an inverse map and use it to map the depth values onto the input image. Sorry if I couldn't explain it properly. At the end, I don't want to shift my X,Y of the image to match the correct depth. I want to shift the depthmap to match my image's X,Y