IntelRealSense / librealsense

Intel® RealSense™ SDK
https://www.intelrealsense.com/
Apache License 2.0
7.61k stars 4.83k forks source link

Cropping aligned depth data #6267

Closed sch1eicher closed 4 years ago

sch1eicher commented 4 years ago
Info
Camera Model D415
Firmware Version 05.12.03.00
Operating System & Version Win 10
Platform PC
Language pyrealsense2 with python 3.7

Issue Description

Hi, I'm quite new to the github comunity, therefore I am sorry if this issue is placed wrongly.

My current program processes the RGB data (from the camera) in a CNN (Mask R-CNN) and returns polygons of the recognized objects. I would like to crop the recognized objects from the depth data (stored in a ply file) for further processing (pose estimation). The problem is that the depth data is not correctly aligned with the RGB data and I haven't found a solution in python to crop only the points from the ply file.

When I use the 3D view in RealSensce Viewer (2.29.0) the depth and RGB data seems like to be aligned perfectly.

If you might got a hint, I would be thankful.

MartyG-RealSense commented 4 years ago

Intel has a Python tutorial program for RealSense that takes the approach of aligning depth and RGB data stored in a 'bag' format file and then applying "MobileNet-SSD" object detection to the aligned image and defining a 'bounding box' around the identified object so that the identified object's distance from the camera can be calculated.

I hope that this tutorial can offer some useful insights for your own project.

https://github.com/IntelRealSense/librealsense/blob/jupyter/notebooks/distance_to_object.ipynb

The link below suggests Python code for cropping an 'RGB aligned to depth' image to a region of interest (ROI) on a point cloud.

https://github.com/IntelRealSense/librealsense/issues/2769#issuecomment-442035864

sch1eicher commented 4 years ago

Many thanks for your suggestions. I need a ply file in order to perform a pose estimation of the detected objects. I´ve used the code from the #2769 comment, but the RGB and the depth images are still not aligned. For example, when I crop a ROI of xmin, xmax, ymin, ymax = 410, 860, 300, 540 (red box in the RGB picture) from the depth data (red area in the depth image) then this ROI is not aligned to the area I´ve chosen in the RGB picture (I´ve used bricks to illustrate my issue).

RGB Depth

Maybe it's a issue with my code:

import pyrealsense2 as rs
import numpy as np
import open3d as o3d
import math as m

def get_image():
    # Define some constants
    resolution_width = 1280  # pixels
    resolution_height = 720  # pixels
    frame_rate = 15  # fps

    # Create a pipeline
    pipeline = rs.pipeline()

    # Enable depth, infrared, color streams
    config = rs.config()
    config.enable_stream(rs.stream.depth)  # , resolution_width, resolution_height, rs.format.z16, frame_rate)
    config.enable_stream(rs.stream.infrared)  # , 1, resolution_width, resolution_height, rs.format.y8, frame_rate)
    config.enable_stream(rs.stream.color)  # , resolution_width, resolution_height, rs.format.bgr8, frame_rate)

    pc = rs.pointcloud()
    pipeline.start(config)

    colorizer = rs.colorizer()
    align_to = rs.stream.color
    align = rs.align(align_to)

    colorizer = rs.colorizer()

    align = rs.align(rs.stream.depth)

    # skip the fist Frames
    for i in range(10):
        # Get frameset of color and depth
        frames = pipeline.wait_for_frames()

    aligned = align.process(frames)
    color_aligned_to_depth = aligned.first(rs.stream.color)

    depth_frame = frames.first(rs.stream.depth)
    color_frame = aligned.get_color_frame()

    # Apply filters
    filters = [rs.disparity_transform(),
               rs.spatial_filter(),
               rs.temporal_filter(),
               rs.disparity_transform(False)]
    for f in filters:
        depth_frame = f.process(depth_frame)

    color = np.asanyarray(color_frame.get_data())
    points = pc.calculate(depth_frame)
    w = rs.video_frame(depth_frame).width
    h = rs.video_frame(depth_frame).height
    print('h: ' + str(h) + 'w: ' + str(w))
    vertices = np.asanyarray(points.get_vertices()).view(np.float32).reshape(h, w, 3)

    # Stop pipeline
    pipeline.stop()
    return color, vertices

color_image, verts = get_image()

xmin, xmax, ymin, ymax = 410, 860, 300, 540  # Bounding Box

roi = verts[ymin:ymax, xmin:xmax, :]

# Pass xyz to Open3D.o3d.geometry.PointCloud 
reshaped_roi = np.reshape(roi, (-1, 3))
print(reshaped_roi.shape)
reshaped_verts = np.reshape(verts, (-1, 3))

pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(reshaped_roi)
o3d.io.write_point_cloud("reshaped_roi.ply", pcd)  # Red in the second picture

pcd_verts = o3d.geometry.PointCloud()
pcd_verts.points = o3d.utility.Vector3dVector(reshaped_verts)
o3d.io.write_point_cloud("reshaped_verts.ply", pcd_verts) #
MartyG-RealSense commented 4 years ago

Apologies for the delay in responding, as I was carefully considering your code.

It looks as though you are applying align before post-processing. It is recommended that align is applied after post-processing, otherwise the image can be affected by a type of distortion called aliasing, which can produce effects such as jagged lines.

Also in the original Python ROI script that I linked to, the boundary values are set before depth_frame is defined.

image

sch1eicher commented 4 years ago

Many thanks for your response. I´ve applied your suggested changes. Actually, the main issue was that the RGB image data, for reasons that are unknown to me, were captured with a resolution of 1920x1080 instead of 1280x720, and therefore the image data did not match with the depth data. Now my program works like it is supposed to.

Thank you very much for your time and support.

MartyG-RealSense commented 4 years ago

You are really welcome. Awesome news that you found a solution - thanks for the update! :)

PerceptMD commented 4 years ago

Hey @MartyG-RealSense, you said:

It looks as though you are applying align before post-processing. It is recommended that align is applied after post-processing, otherwise the image can be affected by a type of distortion called aliasing, which can produce effects such as jagged lines.

In the rs-measure example, align is done before post-processing. But you would suggest it the other way arround?

Also in the original Python ROI script that I linked to, the boundary values are set before depth_frame is defined.

By boundary values you mean the bounding box corners? Why did you highlight this? As far as i understand, they are only used in the line: roi = verts[ymin:ymax, xmin:xmax, :] Doesn't really matter as long as they are defined until this point, does it?

Is there any difference in using the alignment like this auto processed = align.process(frameset); or like this data = data.apply_filter(align_to); ?

Does someone know a handy alternative to this nice python/numpy syntax for cropping in C++?

MartyG-RealSense commented 4 years ago

@PerceptMD The recommendation is to do align after post-processing in order to avoid distortion effects such as aliasing (jagged lines). I have had a couple of cases though where the only way that the RealSense user's program would work without errors is if they did alignment before post-processing. I don't think that the program will fail if align comes before post-proc, but doing align after is the recommendation if possible.

I guess you could think of the boundary box horizontal and vertical min-max values in terms of defining the corners of the box. I don't know enough about boundary box programming to comment on it further than that though, as it isn't something that I have done myself.

I believe that align.process() refers to doing alignment with an Align processing block

In the C# wrapper, align_to was made obsolete in favor of doing alignment with a processing block.

https://github.com/IntelRealSense/librealsense/issues/5582#issuecomment-596092109

image

If you are able to make use of OpenCV in your C++ project then this link for defining a bounding box may be helpful:

https://stackoverflow.com/questions/14505679/drawing-bounding-box-using-opencv-in-c-enviroment

PerceptMD commented 4 years ago

Thanks for your quick reply.

So if possible the order could be changed in rs-measure because this lead me to align before processing.

Okay so align.process() would be the way to go.

Thanks for the link but drawing the box is not my concern. I would be interested in some fancy 2-line code for doing this:

 vertices = np.asanyarray(points.get_vertices()).view(np.float32).reshape(h, w, 3)
 roi = verts[ymin:ymax, xmin:xmax, :]

in C++, if possible.

MartyG-RealSense commented 4 years ago

This is the only C++ method that I personally know of for defining the X-Y of a bounding box with C++:

https://github.com/IntelRealSense/librealsense/issues/2016#issuecomment-403804157

PerceptMD commented 4 years ago

Thanks a lot. ;) I guess reducing the vertices to the ones inside the bounding box wont be a 2 liner in C++, but "-Iterate through your 3Dpoints doing some filtering based on BoundingBox variables, discarding points outside of the box" will do the trick.

sch1eicher commented 4 years ago

Sorry to border you again, but I still have some issues with the correct alignment of the depth and RGB data.

As hopefully visible in the image below, the masks (blue) which were perfectly recognized in the RGB image by a CNN, are not aligned with the real objects (in the depth image). There is a slight shift to the right. I would comprehend if this shift would emerge in the other direction, since the origin of the depth image and the RGB sensor are 15mm apart, but the shift emerges in the other direction.

snapshot05

I´ve found some hints (e.g. #2445) to overcome this issue by switching on “Heuristic” in the RealSense-Viewer and using occlusion filters, but I haven´t found any functions in the pyrealsense2 library.

Do you have any hints to overcome this?

MartyG-RealSense commented 4 years ago

@sch1eicher It's no bother at all :)

When aligning depth to color, the depth image should be aligned to the FOV size of the color sensor. Aligning to the color FOV is necessary because although the D415 camera models use the same imager component for both RGB and IR, on the D435 the IR imager component has a larger "wide" size (compared to the "standard" IR imager size of D415).

Advice about use of the occlusion removal in your own program is provided in the link below:

https://github.com/IntelRealSense/librealsense/issues/5242#issuecomment-563457627

sch1eicher commented 4 years ago

Many thanks for your fast response :) May I´m wrong, but shouldn´t the “align” command

align = rs.align(rs.stream.color)
aligned = align.process(frames)

already align the depth image to the FOV size of the color sensor?

MartyG-RealSense commented 4 years ago

A Python alignment example program in the SDK called align_depth2color.py expresses the alignment like this:

align_to = rs.stream.color align = rs.align(align_to) try: while True: frames = pipeline.wait_for_frames() aligned_frames = align.process(frames)

https://github.com/IntelRealSense/librealsense/blob/master/wrappers/python/examples/align-depth2color.py#L40

It would look more like your own script if a variable substitution is not used for align:

align = rs.align(rs.stream.color) try: while True: frames = pipeline.wait_for_frames() aligned_frames = align.process(frames)