Node to convert YOLO pixel matrix to 3D pose stamped message

marcuskohtongjie95 commented 4 years ago

Hi there, i was reading your reddit post and am interested about this part

I created an ROS node that utilizes OpenCV to syncronize the depth frame with the RGB camera frame from the Kinect 360. In the callback function, it then takes the centermost rgb pixel matrix coordinates of the bounding box of a detected object, and retrieves the depth data from the syncronized depth image (uv,xyz), and converts that to a pose-stamped message that is sent to a modified "head_tracker.py" module from the original rbx2 code (sourced below).

I am working on a project involving human aware navigation, which requires me to detect humans and get their location. I have a realsense r200 camera, which has depth and rgbd info. I have set up YOLO darknet but i have no idea how to get that pixel coordinate into a 3D coordinate (posestamped msg). I was wondering if you could share that code which outputs the pose-stamped msg from YOLO. Thanks!

WyattAutomation commented 4 years ago

That was quite tricky for me as well. I'm working on getting the full code reorganized and posted to a new repo for the "headshot" robot, but let me ask are you using C++ or Python?

Also,

Have you isolated the pointcloud data you need from the datastructures published by the realsense camera rostopic? You will need to subscribe to that, and subscribe to the darknet_ros (legged robotics rospackage for YOLOv3) bounding_boxes topic in your callback function (using Python/Rospy). I will post the code on my lunch break today for getting the "u,v" coordinates (which grossly simplified correspond to the 2D x,y RGB coords) and how to extract depth information from u,v. I will also show you how to pack that into a pose stamped message.

I'd do it now but I'm "sneaking-in" my reply here on a 14 hour shift ;)

I promise I will follow up though, so if you don't hear back from me in a couple of hours be patient, and I will deliver.

WyattAutomation commented 4 years ago

In the meantime, if there is any calibration to get the depth_registered points for the realsense topics, go ahead and get that out of the way first, as it's 100% required for this to work. The pointcloud data may already be hardware registered to the RGB image for that hardware in which case you're already good to go, but if it's not go ahead and do the calibration.

marcuskohtongjie95 commented 4 years ago

I found this code online and was wondering if it does the job? https://github.com/PeteHeine/image_boundingbox_to_3d

The description for use is in this paper: https://ebooks.au.dk/aul/catalog/download/277/192/831-2?inline=1

WyattAutomation commented 4 years ago

Here is what I've done to accomplish what you are asking for, and what's used in the video you saw. I am working on a gituhub repo that will act as a full tutorial with code on how to use YOLOv3 in SLAM robotics (2D to 3D object recognition), but since my time is very limited at the moment I felt that I'd at least share with you the rough, working code that I do have for a node in ROS that takes the bounding boxes from YOLOv3, and uses them to generate pose stamped messages for a pan/tilt turret:

#!/usr/bin/env python

from pprint import pprint
import rospy
from roslib import message
import cv2
from cv_bridge import CvBridge, CvBridgeError

from image_geometry import PinholeCameraModel
from sensor_msgs import point_cloud2
from sensor_msgs.msg import PointCloud2, Image, CameraInfo
from geometry_msgs.msg import Point, PoseStamped, PointStamped
import numpy as np
from darknet_ros_msgs.msg import BoundingBoxes
from darknet_ros_msgs.msg import BoundingBox
import darknet_ros_msgs.msg
import darknet_ros_msgs

import threading
import message_filters
import tf
import image_geometry
import struct

class NearestCloud():

    def __init__(self):
        rospy.init_node("nearest_cloud")
        self.X = 0.0
        self.Y = 0.0
        self.Z = 0.0

        self.bridge = CvBridge()

        self.target_pub = rospy.Publisher('target_pose', PoseStamped, queue_size=5)

        self.point_cloud_sub = message_filters.Subscriber('camera/depth_registered/points', PointCloud2)
        self.yolo_sub = message_filters.Subscriber("darknet_ros/bounding_boxes", BoundingBoxes)

        self.ts = message_filters.ApproximateTimeSynchronizer([self.point_cloud_sub, self.yolo_sub], 10, 0.3)
        self.ts.registerCallback(self.callback)

        rospy.wait_for_message("darknet_ros/bounding_boxes", BoundingBoxes)

        rospy.wait_for_message('point_cloud', PointCloud2)

    def callback(self, pointcloud_data, yolo_data):
        points = list()
        xmin = yolo_data.bounding_boxes[0].xmin
        xmax = yolo_data.bounding_boxes[0].xmax
        ymin = yolo_data.bounding_boxes[0].ymin
        ymax = yolo_data.bounding_boxes[0].ymax

        u = ((xmax + xmin)/2)
        v = ((ymax + ymin)/2)

        print("yolo x,y")
        print(u, v)

        width = pointcloud_data.width
        height = pointcloud_data.height
        point_step = pointcloud_data.point_step
        row_step = pointcloud_data.row_step

        array_pos = v*row_step + u*point_step

        bytesX = [ord(x) for x in pointcloud_data.data[array_pos:array_pos+4]]
        bytesY = [ord(x) for x in pointcloud_data.data[array_pos+4:array_pos+8]]
        bytesZ = [ord(x) for x in pointcloud_data.data[array_pos+8:array_pos+12]]
        print("xyz before byte proccess")
        print(bytesX, bytesY, bytesZ)

        byte_format=struct.pack('4B', *bytesX)
        self.X = struct.unpack('f', byte_format)[0]
        byte_format=struct.pack('4B', *bytesY)
        self.Y = struct.unpack('f', byte_format)[0]
        byte_format=struct.pack('4B', *bytesZ)
        self.Z = struct.unpack('f', byte_format)[0]

        print("XYZ after byte format")
        print(self.X, self.Y, self.Z)
        if np.isnan(self.X):
            return
        if np.isnan(self.Y):
            return
        if np.isnan(self.Z):
            return
        print("get the XYZ that goes to head")

        print(self.X, self.Y, self.Z)

        cog_point = Point()
        cog_point.x = self.X
        cog_point.y = self.Y
        cog_point.z = self.Z

        target = PoseStamped()
        target.header.stamp = rospy.Time.now()
        target.header.frame_id = pointcloud_data.header.frame_id
        target.pose.position = cog_point
        target.pose.orientation.w = 1.0

        # Publish the PoseStamped message on the /target_pose topic
        self.target_pub.publish(target)

if __name__ == '__main__':
    try:
        NearestCloud()
        rospy.spin()
    except rospy.ROSInterruptException:
        rospy.loginfo("Nearest cloud node terminated.")

WyattAutomation commented 4 years ago

^that code has a whole bunch of imports that aren't needed, makes use of a lot of other rospackages and code, and wasn't cleaned up at all after it was confirmed as finally working. It's ugly, but specifically for the purpose of taking the xy center of a 2D bounding box from YOLO, and grabbing the 3D depth value from the pointcloud in the middle of that box to then create pose stamped messages, this is it.

This is implemented in a heavily modified version of the "head only" dynamixel example in Pirobot's "ROS by example, VOL2", which you can find here:
https://github.com/pirobot/rbx2

WyattAutomation / Train-YOLOv3-with-OpenImagesV4

Node to convert YOLO pixel matrix to 3D pose stamped message #11