IntelRealSense / librealsense

Intel® RealSense™ SDK
https://www.intelrealsense.com/
Apache License 2.0
7.53k stars 4.81k forks source link

Depth function not working properly #9088

Closed Swazir9449 closed 3 years ago

Swazir9449 commented 3 years ago

Required Info
Camera Model { D445 }
Firmware Version 2.45.0.3212
Operating System & Version Windows 10
Kernel Version (Linux Only) n/a
Platform python on windows (pycharm)
SDK Version 2.45.0
Language python
Segment

Issue Description

I am trying to write a program that will return the depth of your hand. I am using media pipe to track your hand location, specifically your wrist. With the XY coordinates of your wrist, I am using the get_distance function to get the depth, but its just not accurate. It is only accurate in a straight line in front of the camera. If I put my hand anywhere else, it just says 0 or really far away.

Things to note: Open Media normalizes the coordinates of your hand (each x and y is between 0-1). Then I multiply the results by whatever my resolution is, which gives me the exact position of my hands. I have written the program to show a blue dot of the point that is being tracked. This is to confirm that it is actually trying to get the depth of my hands.

My frames are aligned. I am making sure I am getting the depth from my depth frame.

What am I doing wrong?

example

it only works in this one area. If I move away from the middle, it stops working. I can confirm that the blue dot is following my wrist. The coordinates of the blue dot are the exact same coordinates that I am feeding into the depth function.

The code:

import pyrealsense2 as rs
import numpy as np
import cv2
import mediapipe as mp

mp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands

pipeline = rs.pipeline()

config = rs.config()

pipeline_wrapper = rs.pipeline_wrapper(pipeline)
pipeline_profile = config.resolve(pipeline_wrapper)
device = pipeline_profile.get_device()
device_product_line = str(device.get_info(rs.camera_info.product_line))

config.enable_stream(rs.stream.depth, 1280, 720, rs.format.z16, 30)

if device_product_line == 'L500':
    config.enable_stream(rs.stream.color, 960, 540, rs.format.bgr8, 30)
    print("YEAHEH")
else:
    config.enable_stream(rs.stream.color, 1280, 720, rs.format.bgr8, 30)

profile = pipeline.start(config)

depth_sensor = profile.get_device().first_depth_sensor()
depth_scale = depth_sensor.get_depth_scale()
print("Depth Scale is: ", depth_scale)

 (THIS IS REDUNDANT, I am going to delete it)
clipping_distance_in_meters = 2.5  # 1 meter
clipping_distance = clipping_distance_in_meters / depth_scale

align_to = rs.stream.color   
align = rs.align(align_to)

##so it does not crash on start
WRIST_X = 9
WRIST_Y = 9
WRIST_Z = 1

with mp_hands.Hands(
        min_detection_confidence=0.5,
        min_tracking_confidence=0.5) as hands:
    try:
        while True:
            # Get frameset of color and depth
            frames = pipeline.wait_for_frames()

            ### Align the depth frame to color frame
            aligned_frames = align.process(frames)

            # Get aligned frames
            aligned_depth_frame = aligned_frames.get_depth_frame()  
            color_frame = aligned_frames.get_color_frame()

            # Validate that both frames are valid
            if not aligned_depth_frame or not color_frame:
                continue

            depth_image = np.asanyarray(aligned_depth_frame.get_data())
            color_image = np.asanyarray(color_frame.get_data())

            # Remove background - Set pixels further than clipping_distance to grey-i dont need this
            grey_color = 153
            depth_image_3d = np.dstack(
                (depth_image, depth_image, depth_image))  # depth image is 1 channel, color is 3 channels
            bg_removed = np.where((depth_image_3d > clipping_distance) | (depth_image_3d <= 0), grey_color, color_image)

            # Render images:
            #   depth align to color on left
            #   depth on right
            depth_colormap = cv2.applyColorMap(cv2.convertScaleAbs(depth_image, alpha=0.03), cv2.COLORMAP_JET)
            images = np.hstack((bg_removed, depth_colormap))

            cv2.namedWindow('Align Example', cv2.WINDOW_NORMAL)
            cv2.imshow('Align Example', images)

            key = cv2.waitKey(1)

            ####Hand/wrist Tracking

            # Flip the image horizontally for a later selfie-view display, and convert
            # the BGR image to RGB.
            color_image = cv2.cvtColor(cv2.flip(color_image, 1), cv2.COLOR_BGR2RGB)
            # To improve performance, optionally mark the image as not writeable to
            # pass by reference.
            color_image.flags.writeable = False
            results = hands.process(color_image)

            color_image.flags.writeable = True
            color_image = cv2.cvtColor(color_image, cv2.COLOR_RGB2BGR)

            if results.multi_hand_landmarks:
                for hand_landmarks in results.multi_hand_landmarks:
                    mp_drawing.draw_landmarks(
                        color_image, hand_landmarks, mp_hands.HAND_CONNECTIONS)

                    # MediaPipe normalizes the x and y coordinates to be between (0,1). This is why we multiply each
                    # coodrinate by the resolution in each axis.

                    WRIST_X = (hand_landmarks.landmark[mp_hands.HandLandmark.WRIST].x) * 1280
                    WRIST_Y = (hand_landmarks.landmark[mp_hands.HandLandmark.WRIST].y) * 720

                    ## This is to prevent the program from crashing
                    if (WRIST_X > 1280):
                        WRIST_X = 1279

                    if (WRIST_X < 0):
                        WRIST_X = 0

                    if (WRIST_Y < 0):
                        WRIST_Y = 0

                    if (WRIST_Y > 720):
                        WRIST_Y = 719

            WRIST_X = int(WRIST_X)
            WRIST_Y = int(WRIST_Y)
            depth_wrist = aligned_depth_frame.get_distance(WRIST_X, WRIST_Y)
            print("Wrist X: ", WRIST_X, "Wrist Y:", WRIST_Y, "Wrist Z:", depth_wrist)

            cv2.circle(color_image, (WRIST_X, WRIST_Y), 4, (255, 0, 0), -1)  ##This is the blue dot
            cv2.imshow('MEDIAPIPE', color_image) ## this shows the image with the hand tracking

            # Press esc or 'q' to close the image window
            if key & 0xFF == ord('q') or key == 27:
                cv2.destroyAllWindows()
                break
    finally:
        pipeline.stop()
MartyG-RealSense commented 3 years ago

Hi @Swazir9449 A deviation in measured depth at the sides of an image compared to correct values at the center is a known possibility when using get_distance with an aligned image. The links below provide past reported examples of this effect.

https://github.com/IntelRealSense/librealsense/issues/7395 https://github.com/IntelRealSense/librealsense/issues/7925#issuecomment-751318128 https://github.com/IntelRealSense/librealsense/issues/6749#issuecomment-653548150

When performing alignment and also using post-processing filters, it is recommended that alignment is carried out after application of post-processing, which was the solution in one of the cases linked to above.

https://github.com/IntelRealSense/librealsense/issues/7395#issuecomment-698016942

Whilst you do not seem to be using the RealSense SDK's built-in post-processing filters, I note that the background is removed and the image flipped horizontally after alignment has been initiated. Could you try moving alignment to a point in the script after the flip, and also test commenting out the image flip line to confirm whether or not flipping the image has a negative effect on the depth measurements.

It should also be noted that for the D455, the optimal depth accuracy resolution is 848x480 rather than 1280x720, though you certainly can use 1280x720 if you wish to.

Swazir9449 commented 3 years ago

Hi @MartyG-RealSense,

I was able to fix the problem by removing the horizontal image flip. This was inside the code to make the left and right hand identification make sense. I can apply this later on. Thank you so much for helping me fix this code.

I have a question: Does the use of post-processing filters actually improve the accuracy of the depth function? I read on another github support request that post-processing filters are only for human viewing pleasure and the depth data doesn't actually change.

I am okay with using 840x840. I tried to input that into my code but it just gives me this error. (In this separate piece of code (different than the one above) I am just trying to get two camera streams working side by side, which they do when the resolution is set to 1280x720)

Note: even when I insert try 840x840 in the above code, I still get the same error.

The code (where I am trying to get just two cameras working side by side, which works when the resolution is 1280x720) :

import pyrealsense2 as rs import numpy as np import cv2

Configure depth and color streams...

...from Camera 1

pipeline_1 = rs.pipeline() config_1 = rs.config() config_1.enable_device('105322251827') config_1.enable_stream(rs.stream.depth, 840, 840, rs.format.z16, 30) config_1.enable_stream(rs.stream.color, 840, 840, rs.format.bgr8, 30)

...from Camera 2

pipeline_2 = rs.pipeline() config_2 = rs.config() config_2.enable_device('105322250790') config_2.enable_stream(rs.stream.depth, 840, 840, rs.format.z16, 30) config_2.enable_stream(rs.stream.color, 840, 840, rs.format.bgr8, 30)

Start streaming from both cameras

pipeline_1.start(config_1) pipeline_2.start(config_2)

Alignment

align_to = rs.stream.color align = rs.align(align_to)

try: while True:

    # Camera 1
    # Wait for a coherent pair of frames: depth and color
    frames_1 = pipeline_1.wait_for_frames()
    aligned_frames_1 = align.process(frames_1)
    aligned_depth_frame_1 = aligned_frames_1.get_depth_frame()
    #depth_frame_1 = frames_1.get_depth_frame()
    color_frame_1 = aligned_frames_1.get_color_frame()
    hands = aligned_frames_1.get_color_frame()
    if not aligned_depth_frame_1 or not color_frame_1:
        continue
    # Convert images to numpy arrays
    aligned_depth_image_1 = np.asanyarray(aligned_depth_frame_1.get_data())
    color_image_1 = np.asanyarray(color_frame_1.get_data())
    #hands_image_1 = np.asanyarray(color_frame_1.get_data())
    #Apply colormap on depth image (image must be converted to 8-bit per pixel first)
    depth_colormap_1 = cv2.applyColorMap(cv2.convertScaleAbs(aligned_depth_image_1, alpha=0.5), cv2.COLORMAP_JET)

    # Camera 2
    # Wait for a coherent pair of frames: depth and color
    frames_2 = pipeline_2.wait_for_frames()
    aligned_frames_2 = align.process(frames_2)
    aligned_depth_frame_2 = aligned_frames_2.get_depth_frame()
    # depth_frame_2 = frames_2.get_depth_frame()
    color_frame_2 = aligned_frames_2.get_color_frame()
    hands1 = aligned_frames_2.get_color_frame()
    if not aligned_depth_frame_2 or not color_frame_2:
        continue
    # Convert images to numpy arrays
    aligned_depth_image_2 = np.asanyarray(aligned_depth_frame_2.get_data())
    color_image_2 = np.asanyarray(color_frame_2.get_data())
    #hands_image_2 = np.asanyarray(color_frame_2.get_data())
    # Apply colormap on depth image (image must be converted to 8-bit per pixel first)
    depth_colormap_2 = cv2.applyColorMap(cv2.convertScaleAbs(aligned_depth_image_2, alpha=0.5), cv2.COLORMAP_JET)

    # Stack all images horizontally
    #images = np.hstack((color_image_1,color_image_2))

    # Show images from both cameras

    cv2.imshow('RealSense', images)
    #cv2.imshow('left', color_image_1)
    #cv2.imshow('right', color_image_2)

    cv2.waitKey(1)

    # Save images and depth maps from both cameras by pressing 's'
    ch = cv2.waitKey(25)
    if ch == 115:
        cv2.imwrite("my_image_1.jpg", color_image_1)
        cv2.imwrite("my_depth_1.jpg", depth_colormap_1)
        cv2.imwrite("my_image_2.jpg", color_image_2)
        cv2.imwrite("my_depth_2.jpg", depth_colormap_2)
        print("Save")

    #distance = rs.depth_frame.get_distance(aligned_depth_image_1, 650, 340)
    #print(distance)

finally:

# Stop streaming
pipeline_1.stop()
pipeline_2.stop()

Capture

Swazir9449 commented 3 years ago

How can I use 840x840?

MartyG-RealSense commented 3 years ago

It is great to hear that you were successful in correcting your depth problem!

840x840 is not a resolution that is supported by RealSense. The closest supported resolution is 848x480

In regard to post-processing filters: they can play an important role in performing image enhancement such as reducing noise, stabilizing fluctuations, removing unwanted depth detail using minimum and maximum depth distances, and filling in holes and gaps. So they are more important than being for the purposes of visual pleasure.

It is fair to say though that there are other factors, such as lighting, environment and distance of an observed object from the camera, that will be a primary influence on the accuracy of depth measurements.

MartyG-RealSense commented 3 years ago

Hi @Swazir9449 Do you require further assistance with this case, please? Thanks!

MartyG-RealSense commented 3 years ago

Case closed due to no further comments received.