Closed marcuskohtongjie95 closed 4 years ago
That was quite tricky for me as well. I'm working on getting the full code reorganized and posted to a new repo for the "headshot" robot, but let me ask are you using C++ or Python?
Also,
Have you isolated the pointcloud data you need from the datastructures published by the realsense camera rostopic? You will need to subscribe to that, and subscribe to the darknet_ros (legged robotics rospackage for YOLOv3) bounding_boxes topic in your callback function (using Python/Rospy). I will post the code on my lunch break today for getting the "u,v" coordinates (which grossly simplified correspond to the 2D x,y RGB coords) and how to extract depth information from u,v. I will also show you how to pack that into a pose stamped message.
I'd do it now but I'm "sneaking-in" my reply here on a 14 hour shift ;)
I promise I will follow up though, so if you don't hear back from me in a couple of hours be patient, and I will deliver.
In the meantime, if there is any calibration to get the depth_registered points for the realsense topics, go ahead and get that out of the way first, as it's 100% required for this to work. The pointcloud data may already be hardware registered to the RGB image for that hardware in which case you're already good to go, but if it's not go ahead and do the calibration.
I found this code online and was wondering if it does the job? https://github.com/PeteHeine/image_boundingbox_to_3d
The description for use is in this paper: https://ebooks.au.dk/aul/catalog/download/277/192/831-2?inline=1
Here is what I've done to accomplish what you are asking for, and what's used in the video you saw. I am working on a gituhub repo that will act as a full tutorial with code on how to use YOLOv3 in SLAM robotics (2D to 3D object recognition), but since my time is very limited at the moment I felt that I'd at least share with you the rough, working code that I do have for a node in ROS that takes the bounding boxes from YOLOv3, and uses them to generate pose stamped messages for a pan/tilt turret:
#!/usr/bin/env python
from pprint import pprint
import rospy
from roslib import message
import cv2
from cv_bridge import CvBridge, CvBridgeError
from image_geometry import PinholeCameraModel
from sensor_msgs import point_cloud2
from sensor_msgs.msg import PointCloud2, Image, CameraInfo
from geometry_msgs.msg import Point, PoseStamped, PointStamped
import numpy as np
from darknet_ros_msgs.msg import BoundingBoxes
from darknet_ros_msgs.msg import BoundingBox
import darknet_ros_msgs.msg
import darknet_ros_msgs
import threading
import message_filters
import tf
import image_geometry
import struct
class NearestCloud():
def __init__(self):
rospy.init_node("nearest_cloud")
self.X = 0.0
self.Y = 0.0
self.Z = 0.0
self.bridge = CvBridge()
self.target_pub = rospy.Publisher('target_pose', PoseStamped, queue_size=5)
self.point_cloud_sub = message_filters.Subscriber('camera/depth_registered/points', PointCloud2)
self.yolo_sub = message_filters.Subscriber("darknet_ros/bounding_boxes", BoundingBoxes)
self.ts = message_filters.ApproximateTimeSynchronizer([self.point_cloud_sub, self.yolo_sub], 10, 0.3)
self.ts.registerCallback(self.callback)
rospy.wait_for_message("darknet_ros/bounding_boxes", BoundingBoxes)
rospy.wait_for_message('point_cloud', PointCloud2)
def callback(self, pointcloud_data, yolo_data):
points = list()
xmin = yolo_data.bounding_boxes[0].xmin
xmax = yolo_data.bounding_boxes[0].xmax
ymin = yolo_data.bounding_boxes[0].ymin
ymax = yolo_data.bounding_boxes[0].ymax
u = ((xmax + xmin)/2)
v = ((ymax + ymin)/2)
print("yolo x,y")
print(u, v)
width = pointcloud_data.width
height = pointcloud_data.height
point_step = pointcloud_data.point_step
row_step = pointcloud_data.row_step
array_pos = v*row_step + u*point_step
bytesX = [ord(x) for x in pointcloud_data.data[array_pos:array_pos+4]]
bytesY = [ord(x) for x in pointcloud_data.data[array_pos+4:array_pos+8]]
bytesZ = [ord(x) for x in pointcloud_data.data[array_pos+8:array_pos+12]]
print("xyz before byte proccess")
print(bytesX, bytesY, bytesZ)
byte_format=struct.pack('4B', *bytesX)
self.X = struct.unpack('f', byte_format)[0]
byte_format=struct.pack('4B', *bytesY)
self.Y = struct.unpack('f', byte_format)[0]
byte_format=struct.pack('4B', *bytesZ)
self.Z = struct.unpack('f', byte_format)[0]
print("XYZ after byte format")
print(self.X, self.Y, self.Z)
if np.isnan(self.X):
return
if np.isnan(self.Y):
return
if np.isnan(self.Z):
return
print("get the XYZ that goes to head")
print(self.X, self.Y, self.Z)
cog_point = Point()
cog_point.x = self.X
cog_point.y = self.Y
cog_point.z = self.Z
target = PoseStamped()
target.header.stamp = rospy.Time.now()
target.header.frame_id = pointcloud_data.header.frame_id
target.pose.position = cog_point
target.pose.orientation.w = 1.0
# Publish the PoseStamped message on the /target_pose topic
self.target_pub.publish(target)
if __name__ == '__main__':
try:
NearestCloud()
rospy.spin()
except rospy.ROSInterruptException:
rospy.loginfo("Nearest cloud node terminated.")
^that code has a whole bunch of imports that aren't needed, makes use of a lot of other rospackages and code, and wasn't cleaned up at all after it was confirmed as finally working. It's ugly, but specifically for the purpose of taking the xy center of a 2D bounding box from YOLO, and grabbing the 3D depth value from the pointcloud in the middle of that box to then create pose stamped messages, this is it.
This is implemented in a heavily modified version of the "head only" dynamixel example in Pirobot's "ROS by example, VOL2", which you can find here:
https://github.com/pirobot/rbx2
Hi there, i was reading your reddit post and am interested about this part
I am working on a project involving human aware navigation, which requires me to detect humans and get their location. I have a realsense r200 camera, which has depth and rgbd info. I have set up YOLO darknet but i have no idea how to get that pixel coordinate into a 3D coordinate (posestamped msg). I was wondering if you could share that code which outputs the pose-stamped msg from YOLO. Thanks!