etiennedub / pyk4a

Python 3 wrapper for Azure-Kinect-Sensor-SDK
MIT License
290 stars 81 forks source link

Depth Value Discrepancies #200

Open JJLimmm opened 1 year ago

JJLimmm commented 1 year ago

Hi all,

Just a question regarding Depth Information obtained via the python wrapper and also using the Body Tracking SDK. Also in relation to this issue #92 discussed in another python wrapper repo for more context as to this issue.

Background Info: I am currently trying to test out how i can use a 2D pose estimation model (like openpose or any other) and together with the depth sensor data, obtain the accurate 3D coordinates for the keypoints detected, instead of using the official Kinect Azure Body Tracking SDK.

However, to make sure that i was getting the correct depth value corresponding to the SDK's body tracking keypoints, i had to verify if the method to convert those 2D keypoints from the model into the 3D Depth Coordinate system is giving me the correct results as seen from the Body Tracking SDK. But for comparison, i used the Neck Keypoint as the reference, then convert it to 2D keypoint (x, y) from the body Tracking SDK, and then transform those coordinates back to the 3D depth image (which is the same image coordinate system that is being used for the Body Tracking Keypoint results).

I perform the following steps before comparing the xyz values obtained:

  1. Obtain 2D keypoints from model after conversion using conver_3d_to_2d(), so now the 2D keypoints are in the 2D RGB image space.
  2. Retrieve that transformed depth image similar to the RGB image space using _capture.transformeddepth()
  3. Get the depth value at that 2D keypoint coordinate by indexing the transformed depth image
  4. Using the depth value and 2D keypoint from the RGB image, call the API _convert_2d_to2d() to get the coordinates in the 2D Depth image space.
  5. Get the depth value at that converted coordinate by indexing on the 2D depth image (retrieved by _capture.getdepth())
  6. Using that depth value, converted coordinates, call the API convert_2d_to_3d() to obtain the xyz coordinates in the 3D depth space
  7. Compare the xyz for that coordinate to the xyz of the same keypoint from the Body Tracking SDK.

After performing these steps, i noticed a difference in the z value (depth) from the manual conversion to the Body Tracking SDK. (As shown in the picture below, _bodykps is the xyz coordinate obtained from the Body Tracking SDK where the z-value (depth) is 715.698... The converted xyz coordinate is the "3D point with converted 2D ......." where the z-value is 624. ) depth_value_difference

Does anybody know if i am doing anything wrong or have faced similar issues? Am i supposed to use both 2D Depth & IR images for finding the actual depth? (saw that Body Tracking SDK documentation uses Depth and IR from the Capture object) If so, how do i use these 2 images (depth and IR) to combine and give me the proper depth value with the corresponding coordinates converted from the 2D RGB space.

JJLimmm commented 1 year ago

Issue thread regarding this question.

nliedmey commented 11 months ago

Hey @JJLimmm, have you found a valid solution for your problem?

I am currently working on a comparable project and struggle to get valid depth vales for my 2D-RGB Keypoints as well.

The first try I did was taking the transformed_depth_image and just look up the depth value at [x,y]-coordinate of the RGB pose keypoint. At first it looked good, but some further investigations showed off, that this process has a major issue:

The depth value, that is returned by the transformed_depth_image, is not the true depth (euclidean distance) from the object to the sensor, but it already is the Z-coordinate from one of the both 3D-coordinate systems. Some tests showed off this issue, even though on of the Azure Kinect developers wrote it differently: (Github Issue).

When now converting [x,y] (2D image plane) and Z (3D coordinate system) to a set of 3D coordinates, by hand or with the help of calibration.convert_2d_to_3d() function, the resulting coordinates do not match the reality due to the miss interpretation of the given depth value. In my case, the hip of the 3D person is tilted forward towards the camera because it is placed approximately on the Z-coordinate line.

Have you encountered similar anomalies and how did you solved your problems? :)

JJLimmm commented 11 months ago

Hey @nliedmey , yes i noticed this discrepancy as well between using the calibration from Azure SDK and also our own conversion. Currently i cannot find a way to create a custom calibration to obtain the depth at the 2D keypoint coordinate that i want, it will take up a lot of time and i dont have the luxury of that right now.

For my project, i'm just taking the obtained depth value in the 2D RGB space at face value, and if it shows an invalid depth, i will skip that frame and obtain the next.