Solved/all you need: Realsense L515 or D400 or Dxxx, 2D pixel with depth to 3D world transformation!!! no need to use calibrate, calibrateRobotWorldHandEye, calibrateCamera, CameraMatrix, cv2.solvePNP, chessboard

Solution for Intel RealSense L515 and D400 and Dxxx

i spent many hours to figure out how to transform the 2D pixels with depth into the 3D coordinates. Google and ChatGPT gave me many approaches, but all of them failed or did not work at all.

Then i found the perfect example from Intel here: https://github.com/IntelRealSense/librealsense/blob/master/wrappers/python/examples/opencv_pointcloud_viewer.py

It transforms the 2D pixels of the RGB image plus the depth into a 3D world pointcloud. Great visualized. You can change the perspective by mouse, change the resolution by key. Just great.

But it did not give me the the methods to transform and calibrate in real 3D world.

Now i found it and i want to share.

This coding calibates and transforms image pixels with depth from realsense to 3D world coordinates: As i did not find it maybe some of you have the same problem. So i post this solution with all the keywords and methods that i tried and searched for. Because you do not need all of them. I spent hours to find the intrinsics of the realsense camera with chessboard photos or to calculate it. And then i found: It is just there in the profile of the camera. Just use it.


import pyrealsense2 as rs
import numpy as np

class ImageAndDepth2RealWorldTransformator:

    def __init__(self, intrinsics, referencePoints_pixelDepth, referencePoints_realWorld):
        # EXAMPLE
        # referencePoints_pixelDepth = [
        #     [475, 83, 691],
        #     [958, 130, 638],
        #     [330, 621, 551],
        #     [1395, 648, 577]
        # ]
        #
        # referencePoints_realWorld = np.array([  # a 1.0 is needed as fourth dimension
        #     [0.002, 0.3, 0.0, 1.0],
        #     [0.2468, 0.2415, 0.033, 1.0],
        #     [0.0, 0.0, 0.033, 1.0],
        #     [0.43, 0.0, 0.0, 1.0]
        # ])

        self.intrinsics = intrinsics
        np_pixelDepth = np.array(referencePoints_pixelDepth)

        # Assert-Anweisungen zur Überprüfung der Form
        assert np_pixelDepth.shape[0] >= 4, "Array muss mindestens 4 Zeilen haben"
        assert np_pixelDepth.shape[1] == 3, "Array muss genau 3 Spalten haben"

        assert referencePoints_realWorld.shape[0] == np_pixelDepth.shape[0], "Die Arrays müssen die gleiche Anzahl von Zeilen haben"
        assert referencePoints_realWorld.shape[1] == 4, "Das Array muss genau 4 Spalten haben"
        assert np.all(referencePoints_realWorld[:, 3] == 1.0), "Die vierte Spalte muss überall den Wert 1 haben"

        # siehe Doku in pixelDepth2VectorPoint
        reference_points_vectorPoints = self.pixelDepth2VectorPoint_array(referencePoints_pixelDepth)

        # erstellt eine Matrix, die verwendet werden kann, um homogene Koordinaten zu berechnen.
        # Der Ausdruck referencePoints_realWorld[:, 0:3] extrahiert die ersten drei Spalten der referencePoints_realWorld-Matrix, was die x-, y- und z-Koordinaten der Referenzpunkte sind. Das Resultat hat die Form (4, 3).
        # Der Ausdruck np.ones(referencePoints_realWorld.shape[0]).T erzeugt einen Vektor aus Einsen der Länge 4, der die homogenen Koordinaten darstellt.
        # np.column_stack((referencePoints_realWorld[:, 0:3], np.ones(referencePoints_realWorld.shape[0]).T)) fügt diese beiden Arrays zusammen, wobei die Einsen als vierte Spalte hinzugefügt werden. Das Resultat hat die Form (4, 4).
        # Das .T transponiert das Array, sodass die Endform (4, 4) ist. Dies ist jedoch unnötig, da es bereits in der gewünschten Form vorliegt. Das .T könnte entfernt werden, um denselben Effekt zu erzielen.
        # Ergebnis:
        # realWorldHomogene wird eine (4, 4) numpy-Array sein, die die x-, y-, z-Koordinaten und die homogene Koordinate 1.0 der Referenzpunkte enthält.
        self.realWorldHomogene = np.column_stack((referencePoints_realWorld[:, 0:3], np.ones(referencePoints_realWorld.shape[0]).T)).T

        vectorPoints_4D_with1 = np.ones(self.realWorldHomogene.shape)
        for i, p in enumerate(reference_points_vectorPoints):
            # 4. Dimension ist dann die 1
            vectorPoints_4D_with1[0:3, i] = p

        self.transformationMatrixImage2RealWorld = np.dot(self.realWorldHomogene, np.linalg.pinv(vectorPoints_4D_with1))
        self.transformationMatrixRealWorld2Image = np.linalg.pinv(self.transformationMatrixImage2RealWorld)

        print("ImageAndDepth2RealWorldTransformator initialized and calibrated")
        print("transformationMatrixImage2RealWorld:\n", self.transformationMatrixImage2RealWorld)
        print("transformationMatrixRealWorld2Image:\n", self.transformationMatrixRealWorld2Image)
        print("-------------------")
        print("Sanity Test:")
        print("ImageDepth_to_realWorld")
        for ind, pt in enumerate(vectorPoints_4D_with1.T):
            print("Expected:", referencePoints_realWorld[ind][0:3], "Result:", np.dot(self.transformationMatrixImage2RealWorld, np.array(pt))[0:3])

        print("RealWorld_to_ImageDepth")
        for ind, pt in enumerate(referencePoints_realWorld):
            pt[3] = 1
            print("Expected:", vectorPoints_4D_with1.T[ind][0:3], "Result:", np.dot(self.transformationMatrixRealWorld2Image, np.array(pt))[0:3])

    def pixelDepth2VectorPoint(self, x,y,depth):
        # Umwandeln der pixel mit depth in Vector mit depth als z
        # Um die 3D-Vectorkoordinaten (X, Y, Z) aus Pixelkoordinaten (u, v) und der Tiefe d zu berechnen:
        # X = (u - c_x) * d / f_x
        # Y = (v - c_y) * d / f_y
        # Z = d
        # wobei:
        # u, v = Pixelkoordinaten
        # d = Tiefe
        # c_x, c_y = optisches Zentrum (Principal Point) der Kamera
        # f_x, f_y = Brennweiten der Kamera in x- und y-Richtung

        point_x, point_y, point_z = rs.rs2_deproject_pixel_to_point(self.intrinsics, [x,y], depth)
        return np.array([point_x, point_y, point_z], dtype='float64')

    def pixelDepth2VectorPoint_array(self, pixelDepthArray):
        vectorPointsArray = []
        for pixelDepth in pixelDepthArray:
            vectorPointsArray.append(self.pixelDepth2VectorPoint(pixelDepth[0], pixelDepth[1], pixelDepth[2]))
        return vectorPointsArray

    def pixelDepth2RealWorld(self, x,y,depth):
        reference_points_vectorPoints = []
        test_x, test_y, test_z  = x, y, depth
        point_x, point_y, point_z = rs.rs2_deproject_pixel_to_point(self.intrinsics, [test_x,test_y], test_z)
        point = np.array([point_x, point_y, point_z], dtype='float64')

        vPoint = np.ones((1, 4)) #np.ones(imageAndDepth2RealWorldTransformator.realWorldHomogene.shape)
        vPoint[0, 0:3] = point
        return np.dot(self.transformationMatrixImage2RealWorld, np.array(vPoint[0]))[0:3]

# Get intrinsics of realsense camera:

print("Connected Intel Realsense Camera devices:")
context = rs.context()
for device in context.devices:
    print(f"Device {device.get_info(rs.camera_info.name)} connected")

pipeline = rs.pipeline()

config = rs.config()
config.enable_stream(rs.stream.depth, rs.format.z16, 30)
config.enable_stream(rs.stream.color, rs.format.bgr8, 30)

# Start streaming
pipeline_profile = pipeline.start(config)

# Get stream profile and camera intrinsics
profile = pipeline.get_active_profile()
depth_profile = rs.video_stream_profile(profile.get_stream(rs.stream.depth))
color_profile = rs.video_stream_profile(profile.get_stream(rs.stream.color))
depth_intrinsics = depth_profile.get_intrinsics()
color_intrinsics = color_profile.get_intrinsics()

# depth and color_intrinsics are different, depending on resolution
# Choose wisely which one you are calibrating!

referencePoints_pixelDepth = [
    [475, 83, 691],  # x,y,depth
    [958, 130, 638],
    [330, 621, 551],
    [1395, 648, 577]
]

referencePoints_realWorld = np.array([   
    [0.002, 0.3, 0.0,               1.0],   #x,y,z,1    the 1 is needed as 4th dimension, just set it to 1
    [0.2468, 0.2415, 0.033,         1.0],
    [0.0, 0.0, 0.033,               1.0],
    [0.43, 0.0, 0.0,                1.0]
])

imageAndDepth2RealWorldTransformator = ImageAndDepth2RealWorldTransformator(color_intrinsics, referencePoints_pixelDepth, referencePoints_realWorld)

print("Test: 450, 631, 585   #0.02 rechts von I2 und 0.033 tiefer")
r = imageAndDepth2RealWorldTransformator.pixelDepth2RealWorld(450, 631, 585)
print("Result:", r)

print("Test: 901,138,678     #0.02 links von I1 und 0.033 tiefer")
r = imageAndDepth2RealWorldTransformator.pixelDepth2RealWorld(901, 138, 678)
print("Result:", r)

As a second coding block i share my modified coding for the 3D point cloud I modified the coding a bit by renaming the methods and some variables to make it easier for me to understand and put it into a class. Everything else is the same.

Enyoj!


# Pixel in rgb image -> pixel2Point in 3D World -> point_2_viewPoint in pointcloud as seen by eye

# License: Apache 2.0. See LICENSE file in root directory.
# Copyright(c) 2015-2017 Intel Corporation. All Rights Reserved.

"""
OpenCV and Numpy Point cloud Software Renderer

This sample is mostly for demonstration and educational purposes.
It really doesn't offer the quality or performance that can be
achieved with hardware acceleration.

Usage:
------
Mouse:
    Drag with left button to rotate around pivot (thick small axes),
    with right button to translate and the wheel to zoom.

Keyboard:
    [p]     Pause
    [r]     Reset View
    [d]     Cycle through decimation values
    [z]     Toggle point scaling
    [c]     Toggle color source
    [s]     Save PNG (./out.png)
    [e]     Export points to ply (./out.ply)
    [q\ESC] Quit
"""

import math
import time
import cv2
import numpy as np
import pyrealsense2 as rs

class AppState:

    def __init__(self, *args, **kwargs):
        self.WIN_NAME = 'RealSense'
        self.pitch, self.yaw = math.radians(-10), math.radians(-15)
        self.translation = np.array([0, 0, -1], dtype=np.float32)
        self.distance = 2
        self.prev_mouse = 0, 0
        self.mouse_btns = [False, False, False]
        self.paused = False
        self.decimate = 1
        self.scale = True
        self.color = True

    def reset(self):
        self.pitch, self.yaw, self.distance = 0, 0, 2
        self.translation[:] = 0, 0, -1

    @property
    def rotation(self):
        Rx, _ = cv2.Rodrigues((self.pitch, 0, 0))
        Ry, _ = cv2.Rodrigues((0, self.yaw, 0))
        return np.dot(Ry, Rx).astype(np.float32)

    @property
    def pivot(self):
        return self.translation + np.array((0, 0, self.distance), dtype=np.float32)

class Realsense3DViewer:
    def __init__(self):

        self.state = AppState()

        # Configure depth and color streams
        self.pipeline = rs.pipeline()
        config = rs.config()

        pipeline_wrapper = rs.pipeline_wrapper(self.pipeline)
        #pipeline_profile = config.resolve(pipeline_wrapper)
        #device = pipeline_profile.get_device()

        config.enable_stream(rs.stream.depth, rs.format.z16, 30)
        config.enable_stream(rs.stream.color, rs.format.bgr8, 30)

        # Start streaming
        self.pipeline.start(config)

        # Get stream profile and camera intrinsics
        profile = self.pipeline.get_active_profile()
        self.depth_profile = rs.video_stream_profile(profile.get_stream(rs.stream.depth))
        self.depth_intrinsics = self.depth_profile.get_intrinsics()
        self.depth_intrinsics_w, depth_intrinsics_h = self.depth_intrinsics.width, self.depth_intrinsics.height

        # Processing blocks
        self.pc = rs.pointcloud()
        self.decimate = rs.decimation_filter()
        self.decimate.set_option(rs.option.filter_magnitude, 2 ** self.state.decimate)
        self.colorizer = rs.colorizer()

        cv2.namedWindow(self.state.WIN_NAME, cv2.WINDOW_AUTOSIZE)
        cv2.resizeWindow(self.state.WIN_NAME, self.depth_intrinsics_w, depth_intrinsics_h)
        cv2.setMouseCallback(self.state.WIN_NAME, self.mouse_cb)

        self.out = np.empty((depth_intrinsics_h, self.depth_intrinsics_w, 3), dtype=np.uint8)

    def update_frame_and_view(self):
        if not self.state.paused:
            # Wait for a coherent pair of frames: depth and color
            frames = self.pipeline.wait_for_frames()

            depth_frame = frames.get_depth_frame()
            color_frame = frames.get_color_frame()

            depth_frame = self.decimate.process(depth_frame)

            # Grab new intrinsics (may be changed by decimation)
            self.depth_intrinsics = rs.video_stream_profile(
                depth_frame.profile).get_intrinsics()
            self.depth_intrinsics_w, depth_intrinsics_h = self.depth_intrinsics.width, self.depth_intrinsics.height

            depth_image = np.asanyarray(depth_frame.get_data())
            color_image = np.asanyarray(color_frame.get_data())

            depth_colormap = np.asanyarray(
                self.colorizer.colorize(depth_frame).get_data())

            if self.state.color:
                mapped_frame, color_source = color_frame, color_image
            else:
                mapped_frame, color_source = depth_frame, depth_colormap

            points = self.pc.calculate(depth_frame)
            self.pc.map_to(mapped_frame)

            # Pointcloud data to arrays
            v, t = points.get_vertices(), points.get_texture_coordinates()
            verts = np.asanyarray(v).view(np.float32).reshape(-1, 3)  # xyz
            texcoords = np.asanyarray(t).view(np.float32).reshape(-1, 2)  # uv

        # Render
        now = time.time()

        self.out.fill(0)

        self.drawGrid(self.out, (0, 0.5, 1), size=1, n=10)
        self.drawCameraBounderies(self.out, self.depth_intrinsics)
        self.drawAxes(self.out, self.point2ViewPoint([0, 0, 0]), self.state.rotation, size=0.1, thickness=1)

        if not self.state.scale or self.out.shape[:2] == (depth_intrinsics_h, self.depth_intrinsics_w):
            self.pointcloud(self.out, verts, texcoords, color_source)
        else:
            tmp = np.zeros((depth_intrinsics_h, self.depth_intrinsics_w, 3), dtype=np.uint8)
            self.pointcloud(tmp, verts, texcoords, color_source)
            tmp = cv2.resize( tmp, self.out.shape[:2][::-1], interpolation=cv2.INTER_NEAREST)
            np.putmask(self.out, tmp > 0, tmp)

        if any(self.state.mouse_btns):
            self.drawAxes(self.out, self.point2ViewPoint(self.state.pivot), self.state.rotation, thickness=4)

        dt = time.time() - now

        cv2.setWindowTitle(self.state.WIN_NAME, "RealSense (%dx%d) %dFPS (%.2fms) %s" % (self.depth_intrinsics_w, depth_intrinsics_h, 1.0 / dt, dt * 1000, "PAUSED" if self.state.paused else ""))

        cv2.imshow(self.state.WIN_NAME, self.out)
        key = cv2.waitKey(1)

        if key == ord("r"):
            self.state.reset()

        if key == ord("p"):
            self.state.paused ^= True

        if key == ord("d"):
            self.state.decimate = (self.state.decimate + 1) % 3
            self.decimate.set_option(rs.option.filter_magnitude, 2 ** self.state.decimate)

        if key == ord("z"):
            self.state.scale ^= True

        if key == ord("c"):
            self.state.color ^= True

        if key == ord("s"):
            cv2.imwrite('./out.png', self.out)

        if key == ord("e"):
            points.export_to_ply('./out.ply', mapped_frame)

        # if key in (27, ord("q")) or cv2.getWindowProperty(self.state.WIN_NAME, cv2.WND_PROP_AUTOSIZE) < 0:
        #     break

    def mouse_cb(self, event, x, y, flags, param):

        if event == cv2.EVENT_LBUTTONDOWN:
            self.state.mouse_btns[0] = True

        if event == cv2.EVENT_LBUTTONUP:
            self.state.mouse_btns[0] = False

        if event == cv2.EVENT_RBUTTONDOWN:
            self.state.mouse_btns[1] = True

        if event == cv2.EVENT_RBUTTONUP:
            self.state.mouse_btns[1] = False

        if event == cv2.EVENT_MBUTTONDOWN:
            self.state.mouse_btns[2] = True

        if event == cv2.EVENT_MBUTTONUP:
            self.state.mouse_btns[2] = False

        if event == cv2.EVENT_MOUSEMOVE:

            h, w = self.out.shape[:2]
            dx, dy = x - self.state.prev_mouse[0], y - self.state.prev_mouse[1]

            if self.state.mouse_btns[0]:
                self.state.yaw += float(dx) / w * 2
                self.state.pitch -= float(dy) / h * 2

            elif self.state.mouse_btns[1]:
                dp = np.array((dx / w, dy / h, 0), dtype=np.float32)
                self.state.translation -= np.dot(self.state.rotation, dp)

            elif self.state.mouse_btns[2]:
                dz = math.sqrt(dx**2 + dy**2) * math.copysign(0.01, -dy)
                self.state.translation[2] += dz
                self.state.distance -= dz

        if event == cv2.EVENT_MOUSEWHEEL:
            dz = math.copysign(0.1, flags)
            self.state.translation[2] += dz
            self.state.distance -= dz

        self.state.prev_mouse = (x, y)

    def project3DVector_2_2D(self, v):
        """project 3d vector array to 2d"""
        h, w = self.out.shape[:2]
        view_aspect = float(h)/w

        # ignore divide by zero for invalid depth
        with np.errstate(divide='ignore', invalid='ignore'):
            proj = v[:, :-1] / v[:, -1, np.newaxis] * (w*view_aspect, h) + (w/2.0, h/2.0)

        # near clipping
        znear = 0.03
        proj[v[:, 2] < znear] = np.nan
        return proj

    def point2ViewPoint(self, v):
        """apply view transformation on vector array"""
        return np.dot(v - self.state.pivot, self.state.rotation) + self.state.pivot - self.state.translation

    def drawLine3d(self, out, pt1, pt2, color=(0x80, 0x80, 0x80), thickness=1):
        """draw a 3d line from pt1 to pt2"""
        p0 = self.project3DVector_2_2D(pt1.reshape(-1, 3))[0]
        p1 = self.project3DVector_2_2D(pt2.reshape(-1, 3))[0]
        if np.isnan(p0).any() or np.isnan(p1).any():
            return
        p0 = tuple(p0.astype(int))
        p1 = tuple(p1.astype(int))
        rect = (0, 0, out.shape[1], out.shape[0])
        inside, p0, p1 = cv2.clipLine(rect, p0, p1)
        if inside:
            cv2.line(out, p0, p1, color, thickness, cv2.LINE_AA)

    def drawGrid(self, out, pos, rotation=np.eye(3), size=1, n=10, color=(0x80, 0x80, 0x80)):
        """draw a grid on xz plane"""
        pos = np.array(pos)
        s = size / float(n)
        s2 = 0.5 * size
        for i in range(0, n+1):
            x = -s2 + i*s
            self.drawLine3d(out, self.point2ViewPoint(pos + np.dot((x, 0, -s2), rotation)), self.point2ViewPoint(pos + np.dot((x, 0, s2), rotation)), color)
        for i in range(0, n+1):
            z = -s2 + i*s
            self.drawLine3d(out, self.point2ViewPoint(pos + np.dot((-s2, 0, z), rotation)), self.point2ViewPoint(pos + np.dot((s2, 0, z), rotation)), color)

    def drawAxes(self, out, pos, rotation=np.eye(3), size=0.075, thickness=2):
        """draw 3d axes"""
        self.drawLine3d(out, pos, pos + np.dot((0, 0, size), rotation), (0xff, 0, 0), thickness)
        self.drawLine3d(out, pos, pos + np.dot((0, size, 0), rotation), (0, 0xff, 0), thickness)
        self.drawLine3d(out, pos, pos + np.dot((size, 0, 0), rotation), (0, 0, 0xff), thickness)

    def pixel2Point(self, intrinsics, x, y, depth):
        return rs.rs2_deproject_pixel_to_point(intrinsics, [x, y], depth)

    def drawCameraBounderies(self, out, intrinsics, color=(0x40, 0x40, 0x40)):
        """draw camera's frustum"""
        cameraViewPoint = self.point2ViewPoint([0, 0, 0])
        w, h = intrinsics.width, intrinsics.height

        for d in range(1, 6, 2):
            top_left = self.pixel2Point(intrinsics, 0, 0, d)
            top_right = self.pixel2Point(intrinsics, w, 0, d)
            bottom_right = self.pixel2Point(intrinsics, w, h, d)
            bottom_left = self.pixel2Point(intrinsics, 0, h, d)

            self.drawLine3d(out, cameraViewPoint, self.point2ViewPoint(top_left), color)
            self.drawLine3d(out, cameraViewPoint, self.point2ViewPoint(top_right), color)
            self.drawLine3d(out, cameraViewPoint, self.point2ViewPoint(bottom_right), color)
            self.drawLine3d(out, cameraViewPoint, self.point2ViewPoint(bottom_left), color)

            self.drawLine3d(out, self.point2ViewPoint(top_left), self.point2ViewPoint(top_right), color)
            self.drawLine3d(out, self.point2ViewPoint(top_right), self.point2ViewPoint(bottom_right), color)
            self.drawLine3d(out, self.point2ViewPoint(bottom_right), self.point2ViewPoint(bottom_left), color)
            self.drawLine3d(out, self.point2ViewPoint(bottom_left), self.point2ViewPoint(top_left), color)

    def pointcloud(self, out, verts, texcoords, color, painter=True):
        """draw point cloud with optional painter's algorithm"""
        if painter:
            # Painter's algo, sort points from back to front

            # get reverse sorted indices by z (in view-space)
            # https://gist.github.com/stevenvo/e3dad127598842459b68
            v = self.point2ViewPoint(verts)
            s = v[:, 2].argsort()[::-1]
            proj = self.project3DVector_2_2D(v[s])
        else:
            proj = self.project3DVector_2_2D(self.point2ViewPoint(verts))

        if self.state.scale:
            proj *= 0.5 ** self.state.decimate

        h, w = out.shape[:2]

        # proj now contains 2d image coordinates
        j, i = proj.astype(np.uint32).T

        # create a mask to ignore out-of-bound indices
        im = (i >= 0) & (i < h)
        jm = (j >= 0) & (j < w)
        m = im & jm

        cw, ch = color.shape[:2][::-1]
        if painter:
            # sort texcoord with same indices as above
            # texcoords are [0..1] and relative to top-left pixel corner,
            # multiply by size and add 0.5 to center
            v, u = (texcoords[s] * (cw, ch) + 0.5).astype(np.uint32).T
        else:
            v, u = (texcoords * (cw, ch) + 0.5).astype(np.uint32).T
        # clip texcoords to image
        np.clip(u, 0, ch-1, out=u)
        np.clip(v, 0, cw-1, out=v)

        # perform uv-mapping
        out[i[m], j[m]] = color[u[m], v[m]]

realsense3DViewer = Realsense3DViewer()

while True:
    # Grab camera data
    realsense3DViewer.update_frame_and_view()

# Stop streaming
pipeline.stop()

IntelRealSense / librealsense

Solved/all you need: Realsense L515 or D400 or Dxxx, 2D pixel with depth to 3D world transformation!!! no need to use calibrate, calibrateRobotWorldHandEye, calibrateCamera, CameraMatrix, cv2.solvePNP, chessboard #13123