IntelRealSense / librealsense

Intel® RealSense™ SDK
https://www.intelrealsense.com/
Apache License 2.0
7.6k stars 4.83k forks source link

Calcualting size of object by using color frame and depth frame #9057

Closed akshayacharya97 closed 3 years ago

akshayacharya97 commented 3 years ago

I am running darknet to detect a few objects. I am using the realsense L515 camera. I would now like to calculate the size (length and breadth) of the detected object using the realsense L 515. How do I go about this? I have the bounding box around the objects obtained from darknet. Will that help in any way?

This is my code

import darknet
import cv2
import numpy as np
import pyrealsense2 as rs

"""##############. Function definitions. ##################"""

#Define the detection function
def image_detection(image, network, class_names, class_colors, thresh):
    # Darknet doesn't accept numpy images.
    # Create one with image we reuse for each detect
    width = darknet.network_width(network)
    height = darknet.network_height(network)
    darknet_image = darknet.make_image(width, height, 3)

    #image = cv2.imread(image_path)
    image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image_resized = cv2.resize(image_rgb, (width, height),interpolation=cv2.INTER_LINEAR)

    darknet.copy_image_from_bytes(darknet_image, image_resized.tobytes())
    detections = darknet.detect_image(network, class_names, darknet_image, thresh=thresh)
    darknet.free_image(darknet_image)
    image = darknet.draw_boxes(detections, image_resized, class_colors)
    return cv2.cvtColor(image, cv2.COLOR_BGR2RGB), detections

# Initialize and declare the neural network along with data files, config files etc 
quantity_apples = []
config_file = "/home/jetson/Desktop/pano_l515/yolov4.cfg"
data_file = "/home/jetson/Desktop/pano_l515/coco.data"
weights = "/home/jetson/Desktop/pano_l515/yolov4.weights"

network, class_names, class_colors = darknet.load_network(
        config_file,
        data_file,
        weights,
        batch_size=1
    )

## Realsense from align-depth2color.py

# Create a pipeline
pipeline = rs.pipeline()

# Create a config and configure the pipeline to stream
#  different resolutions of color and depth streams
config = rs.config()

# Get device product line for setting a supporting resolution
pipeline_wrapper = rs.pipeline_wrapper(pipeline)
pipeline_profile = config.resolve(pipeline_wrapper)
device = pipeline_profile.get_device()
device_product_line = str(device.get_info(rs.camera_info.product_line))

config.enable_stream(rs.stream.depth, 1024, 768, rs.format.z16, 30)

if device_product_line == 'L500':
    print(device_product_line)
    config.enable_stream(rs.stream.color, 1280, 720, rs.format.bgr8, 30)
else:
    config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)

# Start streaming
profile = pipeline.start(config)

# Getting the depth sensor's depth scale (see rs-align example for explanation)
depth_sensor = profile.get_device().first_depth_sensor()
depth_scale = depth_sensor.get_depth_scale()
print("Depth Scale is: " , depth_scale)

# We will be removing the background of objects more than
#  clipping_distance_in_meters meters away
clipping_distance_in_meters = 1 #1 meter
clipping_distance = clipping_distance_in_meters / depth_scale

# Create an align object
# rs.align allows us to perform alignment of depth frames to others frames
# The "align_to" is the stream type to which we plan to align depth frames.
align_to = rs.stream.color
align = rs.align(align_to)

# Streaming loop
try:
    for i in range(0,2):
        # Get frameset of color and depth
        frames = pipeline.wait_for_frames()
        # frames.get_depth_frame() is a 640x360 depth image

        # Align the depth frame to color frame
        aligned_frames = align.process(frames)

        # Get aligned frames
        aligned_depth_frame = aligned_frames.get_depth_frame() # aligned_depth_frame is a 640x480 depth image
        color_frame = aligned_frames.get_color_frame()

        # Validate that both frames are valid
        if not aligned_depth_frame or not color_frame:
            continue

        depth_image = np.asanyarray(aligned_depth_frame.get_data())
        color_image = np.asanyarray(color_frame.get_data())

        dn_frame_width = 416
        dn_frame_height = 416

        frame_width = color_image.shape[1]
        frame_height = color_image.shape[0]

        #### Passing the image to darknet
        image, detections = image_detection(color_image, network, class_names, class_colors, thresh=0.05)

        for i in range(len(detections)):
            xc_percent = detections[i][2][0]/dn_frame_width
            yc_percent = detections[i][2][1]/dn_frame_height 
            w_percent = detections[i][2][2]/dn_frame_width
            h_percent = detections[i][2][3]/dn_frame_height
            xc = xc_percent*frame_width
            yc = yc_percent*frame_height
            w = w_percent*frame_width
            h = h_percent*frame_height
            xmin = xc - w/2.0
            ymin = yc - h/2.0
            xmax = xc + w/2.0
            ymax = yc + h/2.0

            #If object is detected, increase the count of the object in the frame
            if detections[i][0] == "apple":
                cv2.rectangle(color_image, (int(xmin),int(ymin)),(int(xmax),int(ymax)),(0,0,255),2)
                cv2.putText(color_image, "apple", (int(xmin), int(ymin-10)), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0,0,255), 2)

        #cv2.imwrite(output_path, frame)            
        # Render images:
        #   depth align to color on left
        #   depth on right
        depth_colormap = cv2.applyColorMap(cv2.convertScaleAbs(depth_image, alpha=0.03), cv2.COLORMAP_JET)
        images = np.hstack((color_image, depth_colormap))
        cv2.imwrite("test_images.jpg", color_image)
        #cv2.namedWindow('Align Example', cv2.WINDOW_NORMAL)
        #cv2.imshow('Align Example', images)
        key = cv2.waitKey(1)
        # Press esc or 'q' to close the image window
        #if key & 0xFF == ord('q') or key == 27:
        cv2.destroyAllWindows()
            #break
finally:
    pipeline.stop()

This is the output file test_images How do i calculate the 2d dimensions roughly from the image?

MartyG-RealSense commented 3 years ago

Hi @akshayacharya97 Do you still require assistance with this case, please? Thanks!

MartyG-RealSense commented 3 years ago

Case closed due to no further comments received.

michaelnguyen11 commented 3 years ago

Hi @MartyG-RealSense ,

I'm doing the project that same as this case : using realsense L515 camera and YOLO to detect object and then calculate object dimension of that object (width, height, length).

I studied "Box Measurement" example but it needs chessboard backdrop and not moving object, so it's not suitable for my case.

Could you guide me how to measure dimension of an object (width, height and length) when that object is detected and has bounding box ?

Thank you in advance.

MartyG-RealSense commented 3 years ago

Hi @michaelnguyen11 If you are able to make use of a commercial solution, Intel has a non-chessboard box measuring solution for L515 called Dimensional Weight Software.

https://www.intelrealsense.com/dimensional-weight-software/

michaelnguyen11 commented 3 years ago

Hi @MartyG-RealSense ,

Thank you for your quick response.

I have 2 Intel Realsense cameras : L515 and D435i, I'm trying to demonstrate measuring object dimension on both camera. With L515 camera, I can use the Dimensional Weight Software solution.

However, in case of D435i camera, could you guide me how to measure the object dimension when the object is detected and got bounding box ?

Thank you in advance.

MartyG-RealSense commented 3 years ago

There was a RealSense user who calculated box dimensions from the angles between the box faces after generating a point cloud of it.

https://github.com/IntelRealSense/librealsense/issues/5506

Another approach to obtaining the volume of an object may be to generate a point cloud of it and then convert it to a mesh.

https://stackoverflow.com/questions/55629892/2019-point-cloud-volume-estimation-with-c/55681149#55681149

This link suggests some further possibilities for measuring to a D435i owner:

https://stackoverflow.com/questions/57472065/object-detection-using-intel-real-sense

The commercial product LIPSMetric Parcel Kiosk took the approach of using OpenVINO Toolkit for their RealSense-powered box measuring solution.

https://www.lips-hci.com/lipsmetric

Also in regard to commercial products, MobileWorkxs offer a tablet-based box dimensioning tool based around the D415.

https://www.mobileworxs.com/products/volume-dimensioning-3d-camera-system/

sampreets3 commented 2 years ago

Hey @akshayacharya97, I am also working on something similar where I try to detect objects using a realsense camera. Do you think you can share the dataset of the apples with me? Thanks a lot :)

MartyG-RealSense commented 2 years ago

Hi @sampreets3 My understanding is that YOLOv4 has a pre-trained COCO (Common Objects in Context) dataset called coco.data that contains data about objects, including apples. The YOLOv4 setup of @akshayacharya97 references this dataset.

https://github.com/lrf008/yolov4/blob/master/README.md#how-to-evaluate-ap-of-yolov4-on-the-ms-coco-evaluation-server

https://viso.ai/computer-vision/coco-dataset/

image

sampreets3 commented 2 years ago

Hey @MartyG-RealSense , oh okay in that case I can start from a model pretrained on COCO dataset and expect some results. Thanks for your help :)