etiennedub / pyk4a

Python 3 wrapper for Azure-Kinect-Sensor-SDK
MIT License
285 stars 80 forks source link

Transformed Depth Image - Depth value experimental setup #221

Open nliedmey opened 7 months ago

nliedmey commented 7 months ago

Hello,

to generate a 3D RGB-Coordinate from 2D RGB image plane coordinates [x,y] I encountered some insights about the transformed_depth image.

In many other issues (Example 1, Example 2), the depth value returned by the transformed_depth image is described as the euclidean distance between the object and the depth sensor. Since I came across some invalid depth measurements with this hypothesis in mind, I did some more testing.

3 objects were placed at different places on a line that lays 1m in front of the azure kinect and runs parallel towards the azure kinect Z-Wall. I did some capturing of the situation and generated transformed_depth images. According to the hypothesis, only the object that is placed in the middle of the line, orthogonal to the Z=0 wall and right in front of the Azure Kinect sensor, should return a depth vlaue of ~1000mm. Both other objects, that are placed around 30cm next to the middle object, should return larger values than ~1000mm, because their euclidean distance towards the azure kinect sensor is larger than 1m in reality.

In fact, this is not the case. All three objects returned a value of ~1000mm depth. This leads to the assumption, that the returned depth value is the distance between the object and the Z-Wall of the azure kinect.

What I am asking me now is, how to generate proper 3D coordinate sets with given [x,y] image plane coordinates and a Z-value, that is the distance towards the Z-Wall and not the true depth of the object? Is the calibration.convert_2d_to_3d() function aware of this? Because it often returned 3D values that seems skewed. Because of this, I addapted to the pinhole model and did manual computations like this:

x_3d = (z * u - z * cx) / fx
y_3d = (z * v - z * cy) / fy

Anyway, with the "z" not being the euclidean distance towards the origin of the coordinate system, these computations do not work properly I think.

Does anyone came across comparable problems and found solutions?

maturk commented 6 months ago

@nliedmey I think your formulas are correct. The real XYZ coordinate in camera space, given a z-depth sensor reading, would be

X = (u - cx) * Z / fx
Y = (v - cy) * Z / fy 
Z = Z

Where Z is the sensor z-depth at the pixel location (u,v) on the depth map. cx and cy are also in pixels. You could then transform this point from camera space to some world coordinates if you have the camera2world transform.