Solved/all you need: Realsense L515 or D400 or Dxxx, 2D pixel with depth to 3D world transformation!!! no need to use calibrate, calibrateRobotWorldHandEye, calibrateCamera, CameraMatrix, cv2.solvePNP, chessboard #13123
Solution for Intel RealSense L515 and D400 and Dxxx
i spent many hours to figure out how to transform the 2D pixels with depth into the 3D coordinates.
Google and ChatGPT gave me many approaches, but all of them failed or did not work at all.
It transforms the 2D pixels of the RGB image plus the depth into a 3D world pointcloud. Great visualized. You can change the perspective by mouse, change the resolution by key. Just great.
But it did not give me the the methods to transform and calibrate in real 3D world.
Now i found it and i want to share.
This coding calibates and transforms image pixels with depth from realsense to 3D world coordinates:
As i did not find it maybe some of you have the same problem.
So i post this solution with all the keywords and methods that i tried and searched for. Because you do not need all of them.
I spent hours to find the intrinsics of the realsense camera with chessboard photos or to calculate it.
And then i found: It is just there in the profile of the camera. Just use it.
import pyrealsense2 as rs
import numpy as np
class ImageAndDepth2RealWorldTransformator:
def __init__(self, intrinsics, referencePoints_pixelDepth, referencePoints_realWorld):
# EXAMPLE
# referencePoints_pixelDepth = [
# [475, 83, 691],
# [958, 130, 638],
# [330, 621, 551],
# [1395, 648, 577]
# ]
#
# referencePoints_realWorld = np.array([ # a 1.0 is needed as fourth dimension
# [0.002, 0.3, 0.0, 1.0],
# [0.2468, 0.2415, 0.033, 1.0],
# [0.0, 0.0, 0.033, 1.0],
# [0.43, 0.0, 0.0, 1.0]
# ])
self.intrinsics = intrinsics
np_pixelDepth = np.array(referencePoints_pixelDepth)
# Assert-Anweisungen zur Überprüfung der Form
assert np_pixelDepth.shape[0] >= 4, "Array muss mindestens 4 Zeilen haben"
assert np_pixelDepth.shape[1] == 3, "Array muss genau 3 Spalten haben"
assert referencePoints_realWorld.shape[0] == np_pixelDepth.shape[0], "Die Arrays müssen die gleiche Anzahl von Zeilen haben"
assert referencePoints_realWorld.shape[1] == 4, "Das Array muss genau 4 Spalten haben"
assert np.all(referencePoints_realWorld[:, 3] == 1.0), "Die vierte Spalte muss überall den Wert 1 haben"
# siehe Doku in pixelDepth2VectorPoint
reference_points_vectorPoints = self.pixelDepth2VectorPoint_array(referencePoints_pixelDepth)
# erstellt eine Matrix, die verwendet werden kann, um homogene Koordinaten zu berechnen.
# Der Ausdruck referencePoints_realWorld[:, 0:3] extrahiert die ersten drei Spalten der referencePoints_realWorld-Matrix, was die x-, y- und z-Koordinaten der Referenzpunkte sind. Das Resultat hat die Form (4, 3).
# Der Ausdruck np.ones(referencePoints_realWorld.shape[0]).T erzeugt einen Vektor aus Einsen der Länge 4, der die homogenen Koordinaten darstellt.
# np.column_stack((referencePoints_realWorld[:, 0:3], np.ones(referencePoints_realWorld.shape[0]).T)) fügt diese beiden Arrays zusammen, wobei die Einsen als vierte Spalte hinzugefügt werden. Das Resultat hat die Form (4, 4).
# Das .T transponiert das Array, sodass die Endform (4, 4) ist. Dies ist jedoch unnötig, da es bereits in der gewünschten Form vorliegt. Das .T könnte entfernt werden, um denselben Effekt zu erzielen.
# Ergebnis:
# realWorldHomogene wird eine (4, 4) numpy-Array sein, die die x-, y-, z-Koordinaten und die homogene Koordinate 1.0 der Referenzpunkte enthält.
self.realWorldHomogene = np.column_stack((referencePoints_realWorld[:, 0:3], np.ones(referencePoints_realWorld.shape[0]).T)).T
vectorPoints_4D_with1 = np.ones(self.realWorldHomogene.shape)
for i, p in enumerate(reference_points_vectorPoints):
# 4. Dimension ist dann die 1
vectorPoints_4D_with1[0:3, i] = p
self.transformationMatrixImage2RealWorld = np.dot(self.realWorldHomogene, np.linalg.pinv(vectorPoints_4D_with1))
self.transformationMatrixRealWorld2Image = np.linalg.pinv(self.transformationMatrixImage2RealWorld)
print("ImageAndDepth2RealWorldTransformator initialized and calibrated")
print("transformationMatrixImage2RealWorld:\n", self.transformationMatrixImage2RealWorld)
print("transformationMatrixRealWorld2Image:\n", self.transformationMatrixRealWorld2Image)
print("-------------------")
print("Sanity Test:")
print("ImageDepth_to_realWorld")
for ind, pt in enumerate(vectorPoints_4D_with1.T):
print("Expected:", referencePoints_realWorld[ind][0:3], "Result:", np.dot(self.transformationMatrixImage2RealWorld, np.array(pt))[0:3])
print("RealWorld_to_ImageDepth")
for ind, pt in enumerate(referencePoints_realWorld):
pt[3] = 1
print("Expected:", vectorPoints_4D_with1.T[ind][0:3], "Result:", np.dot(self.transformationMatrixRealWorld2Image, np.array(pt))[0:3])
def pixelDepth2VectorPoint(self, x,y,depth):
# Umwandeln der pixel mit depth in Vector mit depth als z
# Um die 3D-Vectorkoordinaten (X, Y, Z) aus Pixelkoordinaten (u, v) und der Tiefe d zu berechnen:
# X = (u - c_x) * d / f_x
# Y = (v - c_y) * d / f_y
# Z = d
# wobei:
# u, v = Pixelkoordinaten
# d = Tiefe
# c_x, c_y = optisches Zentrum (Principal Point) der Kamera
# f_x, f_y = Brennweiten der Kamera in x- und y-Richtung
point_x, point_y, point_z = rs.rs2_deproject_pixel_to_point(self.intrinsics, [x,y], depth)
return np.array([point_x, point_y, point_z], dtype='float64')
def pixelDepth2VectorPoint_array(self, pixelDepthArray):
vectorPointsArray = []
for pixelDepth in pixelDepthArray:
vectorPointsArray.append(self.pixelDepth2VectorPoint(pixelDepth[0], pixelDepth[1], pixelDepth[2]))
return vectorPointsArray
def pixelDepth2RealWorld(self, x,y,depth):
reference_points_vectorPoints = []
test_x, test_y, test_z = x, y, depth
point_x, point_y, point_z = rs.rs2_deproject_pixel_to_point(self.intrinsics, [test_x,test_y], test_z)
point = np.array([point_x, point_y, point_z], dtype='float64')
vPoint = np.ones((1, 4)) #np.ones(imageAndDepth2RealWorldTransformator.realWorldHomogene.shape)
vPoint[0, 0:3] = point
return np.dot(self.transformationMatrixImage2RealWorld, np.array(vPoint[0]))[0:3]
# Get intrinsics of realsense camera:
print("Connected Intel Realsense Camera devices:")
context = rs.context()
for device in context.devices:
print(f"Device {device.get_info(rs.camera_info.name)} connected")
pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, rs.format.z16, 30)
config.enable_stream(rs.stream.color, rs.format.bgr8, 30)
# Start streaming
pipeline_profile = pipeline.start(config)
# Get stream profile and camera intrinsics
profile = pipeline.get_active_profile()
depth_profile = rs.video_stream_profile(profile.get_stream(rs.stream.depth))
color_profile = rs.video_stream_profile(profile.get_stream(rs.stream.color))
depth_intrinsics = depth_profile.get_intrinsics()
color_intrinsics = color_profile.get_intrinsics()
# depth and color_intrinsics are different, depending on resolution
# Choose wisely which one you are calibrating!
referencePoints_pixelDepth = [
[475, 83, 691], # x,y,depth
[958, 130, 638],
[330, 621, 551],
[1395, 648, 577]
]
referencePoints_realWorld = np.array([
[0.002, 0.3, 0.0, 1.0], #x,y,z,1 the 1 is needed as 4th dimension, just set it to 1
[0.2468, 0.2415, 0.033, 1.0],
[0.0, 0.0, 0.033, 1.0],
[0.43, 0.0, 0.0, 1.0]
])
imageAndDepth2RealWorldTransformator = ImageAndDepth2RealWorldTransformator(color_intrinsics, referencePoints_pixelDepth, referencePoints_realWorld)
print("Test: 450, 631, 585 #0.02 rechts von I2 und 0.033 tiefer")
r = imageAndDepth2RealWorldTransformator.pixelDepth2RealWorld(450, 631, 585)
print("Result:", r)
print("Test: 901,138,678 #0.02 links von I1 und 0.033 tiefer")
r = imageAndDepth2RealWorldTransformator.pixelDepth2RealWorld(901, 138, 678)
print("Result:", r)
As a second coding block i share my modified coding for the 3D point cloud
I modified the coding a bit by renaming the methods and some variables to make it easier for me to understand and put it into a class.
Everything else is the same.
Enyoj!
# Pixel in rgb image -> pixel2Point in 3D World -> point_2_viewPoint in pointcloud as seen by eye
# License: Apache 2.0. See LICENSE file in root directory.
# Copyright(c) 2015-2017 Intel Corporation. All Rights Reserved.
"""
OpenCV and Numpy Point cloud Software Renderer
This sample is mostly for demonstration and educational purposes.
It really doesn't offer the quality or performance that can be
achieved with hardware acceleration.
Usage:
------
Mouse:
Drag with left button to rotate around pivot (thick small axes),
with right button to translate and the wheel to zoom.
Keyboard:
[p] Pause
[r] Reset View
[d] Cycle through decimation values
[z] Toggle point scaling
[c] Toggle color source
[s] Save PNG (./out.png)
[e] Export points to ply (./out.ply)
[q\ESC] Quit
"""
import math
import time
import cv2
import numpy as np
import pyrealsense2 as rs
class AppState:
def __init__(self, *args, **kwargs):
self.WIN_NAME = 'RealSense'
self.pitch, self.yaw = math.radians(-10), math.radians(-15)
self.translation = np.array([0, 0, -1], dtype=np.float32)
self.distance = 2
self.prev_mouse = 0, 0
self.mouse_btns = [False, False, False]
self.paused = False
self.decimate = 1
self.scale = True
self.color = True
def reset(self):
self.pitch, self.yaw, self.distance = 0, 0, 2
self.translation[:] = 0, 0, -1
@property
def rotation(self):
Rx, _ = cv2.Rodrigues((self.pitch, 0, 0))
Ry, _ = cv2.Rodrigues((0, self.yaw, 0))
return np.dot(Ry, Rx).astype(np.float32)
@property
def pivot(self):
return self.translation + np.array((0, 0, self.distance), dtype=np.float32)
class Realsense3DViewer:
def __init__(self):
self.state = AppState()
# Configure depth and color streams
self.pipeline = rs.pipeline()
config = rs.config()
pipeline_wrapper = rs.pipeline_wrapper(self.pipeline)
#pipeline_profile = config.resolve(pipeline_wrapper)
#device = pipeline_profile.get_device()
config.enable_stream(rs.stream.depth, rs.format.z16, 30)
config.enable_stream(rs.stream.color, rs.format.bgr8, 30)
# Start streaming
self.pipeline.start(config)
# Get stream profile and camera intrinsics
profile = self.pipeline.get_active_profile()
self.depth_profile = rs.video_stream_profile(profile.get_stream(rs.stream.depth))
self.depth_intrinsics = self.depth_profile.get_intrinsics()
self.depth_intrinsics_w, depth_intrinsics_h = self.depth_intrinsics.width, self.depth_intrinsics.height
# Processing blocks
self.pc = rs.pointcloud()
self.decimate = rs.decimation_filter()
self.decimate.set_option(rs.option.filter_magnitude, 2 ** self.state.decimate)
self.colorizer = rs.colorizer()
cv2.namedWindow(self.state.WIN_NAME, cv2.WINDOW_AUTOSIZE)
cv2.resizeWindow(self.state.WIN_NAME, self.depth_intrinsics_w, depth_intrinsics_h)
cv2.setMouseCallback(self.state.WIN_NAME, self.mouse_cb)
self.out = np.empty((depth_intrinsics_h, self.depth_intrinsics_w, 3), dtype=np.uint8)
def update_frame_and_view(self):
if not self.state.paused:
# Wait for a coherent pair of frames: depth and color
frames = self.pipeline.wait_for_frames()
depth_frame = frames.get_depth_frame()
color_frame = frames.get_color_frame()
depth_frame = self.decimate.process(depth_frame)
# Grab new intrinsics (may be changed by decimation)
self.depth_intrinsics = rs.video_stream_profile(
depth_frame.profile).get_intrinsics()
self.depth_intrinsics_w, depth_intrinsics_h = self.depth_intrinsics.width, self.depth_intrinsics.height
depth_image = np.asanyarray(depth_frame.get_data())
color_image = np.asanyarray(color_frame.get_data())
depth_colormap = np.asanyarray(
self.colorizer.colorize(depth_frame).get_data())
if self.state.color:
mapped_frame, color_source = color_frame, color_image
else:
mapped_frame, color_source = depth_frame, depth_colormap
points = self.pc.calculate(depth_frame)
self.pc.map_to(mapped_frame)
# Pointcloud data to arrays
v, t = points.get_vertices(), points.get_texture_coordinates()
verts = np.asanyarray(v).view(np.float32).reshape(-1, 3) # xyz
texcoords = np.asanyarray(t).view(np.float32).reshape(-1, 2) # uv
# Render
now = time.time()
self.out.fill(0)
self.drawGrid(self.out, (0, 0.5, 1), size=1, n=10)
self.drawCameraBounderies(self.out, self.depth_intrinsics)
self.drawAxes(self.out, self.point2ViewPoint([0, 0, 0]), self.state.rotation, size=0.1, thickness=1)
if not self.state.scale or self.out.shape[:2] == (depth_intrinsics_h, self.depth_intrinsics_w):
self.pointcloud(self.out, verts, texcoords, color_source)
else:
tmp = np.zeros((depth_intrinsics_h, self.depth_intrinsics_w, 3), dtype=np.uint8)
self.pointcloud(tmp, verts, texcoords, color_source)
tmp = cv2.resize( tmp, self.out.shape[:2][::-1], interpolation=cv2.INTER_NEAREST)
np.putmask(self.out, tmp > 0, tmp)
if any(self.state.mouse_btns):
self.drawAxes(self.out, self.point2ViewPoint(self.state.pivot), self.state.rotation, thickness=4)
dt = time.time() - now
cv2.setWindowTitle(self.state.WIN_NAME, "RealSense (%dx%d) %dFPS (%.2fms) %s" % (self.depth_intrinsics_w, depth_intrinsics_h, 1.0 / dt, dt * 1000, "PAUSED" if self.state.paused else ""))
cv2.imshow(self.state.WIN_NAME, self.out)
key = cv2.waitKey(1)
if key == ord("r"):
self.state.reset()
if key == ord("p"):
self.state.paused ^= True
if key == ord("d"):
self.state.decimate = (self.state.decimate + 1) % 3
self.decimate.set_option(rs.option.filter_magnitude, 2 ** self.state.decimate)
if key == ord("z"):
self.state.scale ^= True
if key == ord("c"):
self.state.color ^= True
if key == ord("s"):
cv2.imwrite('./out.png', self.out)
if key == ord("e"):
points.export_to_ply('./out.ply', mapped_frame)
# if key in (27, ord("q")) or cv2.getWindowProperty(self.state.WIN_NAME, cv2.WND_PROP_AUTOSIZE) < 0:
# break
def mouse_cb(self, event, x, y, flags, param):
if event == cv2.EVENT_LBUTTONDOWN:
self.state.mouse_btns[0] = True
if event == cv2.EVENT_LBUTTONUP:
self.state.mouse_btns[0] = False
if event == cv2.EVENT_RBUTTONDOWN:
self.state.mouse_btns[1] = True
if event == cv2.EVENT_RBUTTONUP:
self.state.mouse_btns[1] = False
if event == cv2.EVENT_MBUTTONDOWN:
self.state.mouse_btns[2] = True
if event == cv2.EVENT_MBUTTONUP:
self.state.mouse_btns[2] = False
if event == cv2.EVENT_MOUSEMOVE:
h, w = self.out.shape[:2]
dx, dy = x - self.state.prev_mouse[0], y - self.state.prev_mouse[1]
if self.state.mouse_btns[0]:
self.state.yaw += float(dx) / w * 2
self.state.pitch -= float(dy) / h * 2
elif self.state.mouse_btns[1]:
dp = np.array((dx / w, dy / h, 0), dtype=np.float32)
self.state.translation -= np.dot(self.state.rotation, dp)
elif self.state.mouse_btns[2]:
dz = math.sqrt(dx**2 + dy**2) * math.copysign(0.01, -dy)
self.state.translation[2] += dz
self.state.distance -= dz
if event == cv2.EVENT_MOUSEWHEEL:
dz = math.copysign(0.1, flags)
self.state.translation[2] += dz
self.state.distance -= dz
self.state.prev_mouse = (x, y)
def project3DVector_2_2D(self, v):
"""project 3d vector array to 2d"""
h, w = self.out.shape[:2]
view_aspect = float(h)/w
# ignore divide by zero for invalid depth
with np.errstate(divide='ignore', invalid='ignore'):
proj = v[:, :-1] / v[:, -1, np.newaxis] * (w*view_aspect, h) + (w/2.0, h/2.0)
# near clipping
znear = 0.03
proj[v[:, 2] < znear] = np.nan
return proj
def point2ViewPoint(self, v):
"""apply view transformation on vector array"""
return np.dot(v - self.state.pivot, self.state.rotation) + self.state.pivot - self.state.translation
def drawLine3d(self, out, pt1, pt2, color=(0x80, 0x80, 0x80), thickness=1):
"""draw a 3d line from pt1 to pt2"""
p0 = self.project3DVector_2_2D(pt1.reshape(-1, 3))[0]
p1 = self.project3DVector_2_2D(pt2.reshape(-1, 3))[0]
if np.isnan(p0).any() or np.isnan(p1).any():
return
p0 = tuple(p0.astype(int))
p1 = tuple(p1.astype(int))
rect = (0, 0, out.shape[1], out.shape[0])
inside, p0, p1 = cv2.clipLine(rect, p0, p1)
if inside:
cv2.line(out, p0, p1, color, thickness, cv2.LINE_AA)
def drawGrid(self, out, pos, rotation=np.eye(3), size=1, n=10, color=(0x80, 0x80, 0x80)):
"""draw a grid on xz plane"""
pos = np.array(pos)
s = size / float(n)
s2 = 0.5 * size
for i in range(0, n+1):
x = -s2 + i*s
self.drawLine3d(out, self.point2ViewPoint(pos + np.dot((x, 0, -s2), rotation)), self.point2ViewPoint(pos + np.dot((x, 0, s2), rotation)), color)
for i in range(0, n+1):
z = -s2 + i*s
self.drawLine3d(out, self.point2ViewPoint(pos + np.dot((-s2, 0, z), rotation)), self.point2ViewPoint(pos + np.dot((s2, 0, z), rotation)), color)
def drawAxes(self, out, pos, rotation=np.eye(3), size=0.075, thickness=2):
"""draw 3d axes"""
self.drawLine3d(out, pos, pos + np.dot((0, 0, size), rotation), (0xff, 0, 0), thickness)
self.drawLine3d(out, pos, pos + np.dot((0, size, 0), rotation), (0, 0xff, 0), thickness)
self.drawLine3d(out, pos, pos + np.dot((size, 0, 0), rotation), (0, 0, 0xff), thickness)
def pixel2Point(self, intrinsics, x, y, depth):
return rs.rs2_deproject_pixel_to_point(intrinsics, [x, y], depth)
def drawCameraBounderies(self, out, intrinsics, color=(0x40, 0x40, 0x40)):
"""draw camera's frustum"""
cameraViewPoint = self.point2ViewPoint([0, 0, 0])
w, h = intrinsics.width, intrinsics.height
for d in range(1, 6, 2):
top_left = self.pixel2Point(intrinsics, 0, 0, d)
top_right = self.pixel2Point(intrinsics, w, 0, d)
bottom_right = self.pixel2Point(intrinsics, w, h, d)
bottom_left = self.pixel2Point(intrinsics, 0, h, d)
self.drawLine3d(out, cameraViewPoint, self.point2ViewPoint(top_left), color)
self.drawLine3d(out, cameraViewPoint, self.point2ViewPoint(top_right), color)
self.drawLine3d(out, cameraViewPoint, self.point2ViewPoint(bottom_right), color)
self.drawLine3d(out, cameraViewPoint, self.point2ViewPoint(bottom_left), color)
self.drawLine3d(out, self.point2ViewPoint(top_left), self.point2ViewPoint(top_right), color)
self.drawLine3d(out, self.point2ViewPoint(top_right), self.point2ViewPoint(bottom_right), color)
self.drawLine3d(out, self.point2ViewPoint(bottom_right), self.point2ViewPoint(bottom_left), color)
self.drawLine3d(out, self.point2ViewPoint(bottom_left), self.point2ViewPoint(top_left), color)
def pointcloud(self, out, verts, texcoords, color, painter=True):
"""draw point cloud with optional painter's algorithm"""
if painter:
# Painter's algo, sort points from back to front
# get reverse sorted indices by z (in view-space)
# https://gist.github.com/stevenvo/e3dad127598842459b68
v = self.point2ViewPoint(verts)
s = v[:, 2].argsort()[::-1]
proj = self.project3DVector_2_2D(v[s])
else:
proj = self.project3DVector_2_2D(self.point2ViewPoint(verts))
if self.state.scale:
proj *= 0.5 ** self.state.decimate
h, w = out.shape[:2]
# proj now contains 2d image coordinates
j, i = proj.astype(np.uint32).T
# create a mask to ignore out-of-bound indices
im = (i >= 0) & (i < h)
jm = (j >= 0) & (j < w)
m = im & jm
cw, ch = color.shape[:2][::-1]
if painter:
# sort texcoord with same indices as above
# texcoords are [0..1] and relative to top-left pixel corner,
# multiply by size and add 0.5 to center
v, u = (texcoords[s] * (cw, ch) + 0.5).astype(np.uint32).T
else:
v, u = (texcoords * (cw, ch) + 0.5).astype(np.uint32).T
# clip texcoords to image
np.clip(u, 0, ch-1, out=u)
np.clip(v, 0, cw-1, out=v)
# perform uv-mapping
out[i[m], j[m]] = color[u[m], v[m]]
realsense3DViewer = Realsense3DViewer()
while True:
# Grab camera data
realsense3DViewer.update_frame_and_view()
# Stop streaming
pipeline.stop()
Solution for Intel RealSense L515 and D400 and Dxxx
i spent many hours to figure out how to transform the 2D pixels with depth into the 3D coordinates. Google and ChatGPT gave me many approaches, but all of them failed or did not work at all.
Then i found the perfect example from Intel here: https://github.com/IntelRealSense/librealsense/blob/master/wrappers/python/examples/opencv_pointcloud_viewer.py
It transforms the 2D pixels of the RGB image plus the depth into a 3D world pointcloud. Great visualized. You can change the perspective by mouse, change the resolution by key. Just great.
But it did not give me the the methods to transform and calibrate in real 3D world.
Now i found it and i want to share.
This coding calibates and transforms image pixels with depth from realsense to 3D world coordinates: As i did not find it maybe some of you have the same problem. So i post this solution with all the keywords and methods that i tried and searched for. Because you do not need all of them. I spent hours to find the intrinsics of the realsense camera with chessboard photos or to calculate it. And then i found: It is just there in the profile of the camera. Just use it.
As a second coding block i share my modified coding for the 3D point cloud I modified the coding a bit by renaming the methods and some variables to make it easier for me to understand and put it into a class. Everything else is the same.
Enyoj!