Transforming pixel from a depth image to world coordinates

Litvak1 commented 6 years ago

Required Info
Camera Model	D415
Firmware Version	05.08.15.00
Operating System & Version	Linux (Ubuntu 16.04)
Kernel Version (Linux Only)	4.13.0-45-generic
Platform	PC
SDK Version	?
Language	python

Transforming pixel from a depth image to world coordinates

Hi, As part of my research, I am using the D415 Realsense camera to capture depth images of an object randomly placed on a table (I am using only depth images). I used image detection to find the exact pixel in the depth image representing the center of the object, and I would like to know how to transform from that pixel position (row, column) in the depth image to the real Cartesian position (x,y) in the world, where the real depth and z are known (the images are always being taken from a certain known position above the table). I used this code to retrieve to 3D point with respect to 'the center of the physical imager' (as explained here, I tested it with ppx and ppy values):

pipeline = rs.pipeline()
config = rs.config() config.disable_stream(rs.stream.color) pipeline.start(config) frames = pipeline.wait_for_frames() depth_frame = frames.get_depth_frame() depth_intrin = depth_frame.profile.as_video_stream_profile().intrinsics depth_value = 0.5 depth_pixel = [depth_intrin.ppx, depth_intrin.ppy] depth_point = rs.rs2_deproject_pixel_to_point(depth_intrin, depth_pixel, depth_value) print depth_point

The result is [0.0, 0.0, 0.5] as it should be.

I know where I placed the camera above the table, assuming that the center point would be the center of the front side of the camera (where the glass is), and this assumption is wrong.

So after this introduction, my question is where exactly is the location of 'the center of the physical imager' with respect to the center of the front side of the camera? Or maybe I should do some alignment before using rs2_deproject_pixel_to_point?

Thank you for your time!

0xLeon commented 6 years ago

I'm not sure if I understand your problem completely, but isn't this what rs.depth_frame.get_distance(x, y) does? Sure this will give you the 3D position in camera coordinates only. You have to define your reference world coordinate system with know extrinsics of the camera and apply this transformation to the camera coordinates of your measured 3D position to get the 3D position in relation to your defined world coordinate system.

Litvak1 commented 6 years ago

get_distance(x, y) will give me the distance to that specific pixel. That distance I already have beacuse I placed the camera at a certain altitude above the table. rs2_deproject_pixel_to_point is used to transform from a pixel in the image to a 3D point with respect to 'the center of the physical imager' which is somewhere in the camera but not exactly in the middle.

I know that the center of the front side of the camera is placed at a certain position with respect to my world frame, but I don't know what is transformation between this point and 'the center of the physical imager' (the reference point).

Is this information available as 'extrinsics'? How can I retrieve this information in python?

I hope this time my question is more clear.

jianjieluo commented 6 years ago

I used this script and I found its result is not bad.

import pyrealsense2 as rs
import numpy as np
import cv2
import os

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

# Configure depth and color streams
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 1280, 720, rs.format.z16, 30)
config.enable_stream(rs.stream.color, 1280, 720, rs.format.bgr8, 30)

# Start streaming
pipe_profile = pipeline.start(config)

curr_frame = 0

try:
    while True:

        # Wait for a coherent pair of frames: depth and color
        frames = pipeline.wait_for_frames()
        depth_frame = frames.get_depth_frame()
        color_frame = frames.get_color_frame()
        if not depth_frame or not color_frame:
            continue

        # Intrinsics & Extrinsics
        depth_intrin = depth_frame.profile.as_video_stream_profile().intrinsics
        color_intrin = color_frame.profile.as_video_stream_profile().intrinsics
        depth_to_color_extrin = depth_frame.profile.get_extrinsics_to(
            color_frame.profile)

        # print(depth_intrin.ppx, depth_intrin.ppy)

        # Convert images to numpy arrays
        depth_image = np.asanyarray(depth_frame.get_data())
        color_image = np.asanyarray(color_frame.get_data())

        # find the human face in the color_image
        gray = cv2.cvtColor(color_image, cv2.COLOR_BGR2GRAY)
        faces = face_cascade.detectMultiScale(gray, 1.3, 5)
        for (x, y, w, h) in faces:
            if curr_frame > 100 and curr_frame % 40 == 10:
                roi_depth_image = depth_image[y:y+h, x:x+w]
                roi_color_image = color_image[y:y+h, x:x+w]
                os.system('mkdir -p ./3d_output/%d' % curr_frame)
                cv2.imwrite('./3d_output/%d/depth.jpg' %
                            curr_frame, roi_depth_image)
                cv2.imwrite('./3d_output/%d/color.jpg' %
                            curr_frame, roi_color_image)
                print("the mid position depth is:", depth_frame.get_distance(
                    int(x+w/2), int(y+h/2)))

                # write the depth data in a depth.txt
                with open('./3d_output/%d/depth.csv' % curr_frame, 'w') as f:
                    cols = list(range(x, x+w))
                    rows = list(range(y, y+h))
                    for i in rows:
                        for j in cols:
                            # 坐标变换一定要注意检查
                            depth = depth_frame.get_distance(j, i)
                            depth_point = rs.rs2_deproject_pixel_to_point(
                                depth_intrin, [j, i], depth)
                            text = "%.5lf, %.5lf, %.5lf\n" % (
                                depth_point[0], depth_point[1], depth_point[2])
                            f.write(text)
                print("Finish writing the depth img")

            cv2.rectangle(color_image, (x, y), (x+w, y+h), (255, 0, 0), 2)

         # Apply colormap on depth image (image must be converted to 8-bit per pixel first)
        depth_colormap = cv2.applyColorMap(cv2.convertScaleAbs(
            depth_image, alpha=0.03), cv2.COLORMAP_JET)

        # Stack both images horizontally
        images = np.hstack((color_image, depth_colormap))

        # Show images
        cv2.namedWindow('RealSense', cv2.WINDOW_AUTOSIZE)
        cv2.imshow('RealSense', images)
        cv2.waitKey(1)

        curr_frame += 1
finally:

    # Stop streaming
    pipeline.stop()

I used OpenCV's face detector to capture the human face and then transferred the face region to 3D coordinates. May the script is not elegant. I think the essential code you're interested in is

with open('./3d_output/%d/depth.csv' % curr_frame, 'w') as f:
                    cols = list(range(x, x+w))
                    rows = list(range(y, y+h))
                    for i in rows:
                        for j in cols:
                            # 坐标变换一定要注意检查
                            depth = depth_frame.get_distance(j, i)
                            depth_point = rs.rs2_deproject_pixel_to_point(
                                depth_intrin, [j, i], depth)
                            text = "%.5lf, %.5lf, %.5lf\n" % (
                                depth_point[0], depth_point[1], depth_point[2])
                            f.write(text)
                print("Finish writing the depth img")

, where I use [j, i] to transfer the pixel one by one to get the 3D data, in meters.

Since I am a green hand in 3D reconstruction, I am not sure whether it can solve your problem, while I hope it can give you some inspiration.

Litvak1 commented 6 years ago

Hi @longjj,

For getting the 'depth_point' (x,y,z with respect to the reference point of the image) you can see I did something really similar to you.

My problem starts right after I have this 3D 'depth_point'. I don't know how to transform it to my world coordinates (the corner of my table for example), because I don't know where is this 'reference point of the image' with respect to the center of the real camera, because the images are not aligned to the physical camera.

I will try to explain my situation in a different way. This is my scene:

I have a surface (table. but lets ignore the height of its legs and treat it as it's the floor). It's dimensions are 1 meter * 1 meter.
The 'base frame' in this world (the point [0,0,0]) is in the corner of table (in 'world coordinates')
I placed the camera 0.75 meter above the center of the table, pointing down at the surface, in a position where the center of the front of the camera is in position [0.5, 0.5, 0.75] with respect to my world coordinates.
I placed a small cube in the center of the surface, at position [0.5, 0.5, 0], again with respect to my world coordinates.
I took a depth image, and the center of the cube is not in the center pixel of the image, but instead it is located in some offset from the center pixel.

The offsets I am talking about are around 1-2 cm, and I am trying to get a precision of less then 1 mm.

So you can understand that the center of the depth image is not where the real center of the physical camera is pointing at. I am trying to understand this offset in order to be able to transform from a certain pixel in the image (not exactly in the middle) to a certain position with respect to the my world coordinates.

As I understand, using parameters from 'extrinsics' only let you align the depth and color images, but neither of them is aligned with the real center of the physical camera.

I hope you'll have some suggestions.

Thanks!

jianjieluo commented 6 years ago

Hi @Litvak1 , I get your point. The infrared camera and the RGB color camera do have offset which leads to the situation you say.

Using my poor 3D knowledge I don't know how to solve this problem in pylibrealsense2 either currently. Maybe @0xLeon is right, and you just need some transform with depth_to_color_extrin while I didn't try it before.

Maybe we need to wait for others' further reply.

agrunnet commented 6 years ago

The center is aligned to the LEFT imager (as seen from the camera, so it's left eye).

Litvak1 commented 6 years ago

I appreciate all your help guys!

So far I haven't found a written explanation to my question.

I decided to perform a small test to see if I could find this 'reference point'. I placed a small cube in the center of my table, and took depth images from different positions around the cube (always from 0.52m above the cube, but with different x and y positions). From each image I manually found the pixel of the center of the cube, and then used rs2_deproject_pixel_to_point to find where the camera thinks the object is located with respect to its 'unknown' reference point. From the altitude of 0.52m above the object, the depth image width is ~0.71m, height is ~0.4m, and each pixel is a square of size 0.00055m.

I attached my results. realsense_calibration.xlsx I hope the headlines are enough to understand the table. I'll just say that the letters in brackets represent the columns that are being calculated.

The values in columns J and K are the 'average offset' of all my positions. So as far as I understand, the reference point of my specific camera is 'hanging in the air', located -0.01931m in the X axis and 0.01638m in the Y axis with respect to the center of the front side of the physical camera.

If that is true, one can assume that this offset information is coded to the camera, just like its intrinsics parameters.

Did anyone hear about that before and can explain?

I still need the exact parameters for my research. I'm trying the reach pose estimation accuracy of less than 1 mm error.

Thanks

MartyG-RealSense commented 6 years ago

If your question is still where is the origin of the imager, I would refer you to this link:

https://communities.intel.com/message/543826#543826

Litvak1 commented 6 years ago

I actually read the post from the link you sent two weeks ago :) Unfortunately, it didn't answer my question.

From the calibration I did it seems that the reference point of the camera (where the [0,0,0] is located) is 'floating' in the air and not where the physical left imager is located.

I think you can get the idea of what I'm trying to explain from the file I uploaded before: realsense_calibration.xlsx

The values in columns J and K are the 'average offset' of all my positions. So as far as I understand, the reference point of my specific camera is 'hanging in the air', located -0.01931m in the X axis and 0.01638m in the Y axis with respect to the center of the front side of the physical camera.

MartyG-RealSense commented 6 years ago

My RealSense knowledge doesn't cover calibration, unfortunately, so one of the excellent team members on this forum can undoubtedly provide you with a better answer to this question than I can.

RealSense-Customer-Engineering commented 6 years ago

[Realsense Customer Engineering Team Comment] @Litvak1 Here is the physical origin documentation which will be integrated into our data sheet, attach here for your reference.

carypai commented 6 years ago

RealSense-Customer-Engineering commented 6 years ago

[Realsense Customer Engineering Team Comment] @Litvak Just found out it's already merged into latest data sheet, please check with link below : https://www.intel.com/content/dam/support/us/en/documents/emerging-technologies/intel-realsense-technology/Intel-RealSense-D400-Series-Datasheet.pdf

Superlee506 commented 5 years ago

@longjj Hi, thanks for sharing your codes. I have a question about the color and depth images. How can I get the world coordinates of a pixel in the color image.

Another question is that, when you detect a face, you directly use the detection results on depth image. I think you should first align the depth image to color image and then you can use the same detection results.

jianjieluo commented 5 years ago

@Superlee506 Hi, I am sorry that I am no longer deep into the usage of Intel realsense SDK in the recent time. Maybe there is an api for the converting between the color image & depth image. Or there is an api that returns the answer you want directly.

What's more, your suggestion about the alignment is right and I didn't consider it at that time, thanks.

npd26 commented 5 years ago

@0xLeon can I say reference world coordinates as robot base_link?

Denial1990 commented 5 years ago

By using the 'ppx' and 'ppy' parameters of intrinsic, we can get the center of projection.

saikishor commented 4 years ago

@Litvak1 May be this should give you some idea https://github.com/ros-perception/image_pipeline/blob/melodic/depth_image_proc/src/nodelets/point_cloud_xyz.cpp

ahasan016 commented 4 years ago

I actually read the post from the link you sent two weeks ago :) Unfortunately, it didn't answer my question.

From the calibration I did it seems that the reference point of the camera (where the [0,0,0] is located) is 'floating' in the air and not where the physical left imager is located.

I think you can get the idea of what I'm trying to explain from the file I uploaded before: realsense_calibration.xlsx

The values in columns J and K are the 'average offset' of all my positions. So as far as I understand, the reference point of my specific camera is 'hanging in the air', located -0.01931m in the X axis and 0.01638m in the Y axis with respect to the center of the front side of the physical camera.

Did you find the solution for transferring image coordinate to real-world coordinate? I am currently facing the same difficulties and it would be really helpful if you could share your experience.

Thanks

IntelRealSense / librealsense

Transforming pixel from a depth image to world coordinates #1904