IntelligentRoboticsLabs / gb_visual_detection_3d

87 stars 32 forks source link

No 3D output #38

Closed dnlwbr closed 3 years ago

dnlwbr commented 3 years ago

Hello, interesting repository. Unfortunately I do not get any output. The 2D boxes are displayed, but nothing is published on /darknet_ros_3d/bounding_boxes. The classes in interested_classes also exist on the image. Does the point cloud have to use a special coordinate system? This was hinted at in issue #10 if I understood that correctly. I use an Azure Kinect, i.e. the positive x-axis points to the right, the positive y-axis points down and the positive z-axis points forward. Does the camera have to be aligned so that the z-axis is parallel to the ground or are oblique bounding boxes also possible? How exactly is the 3D box calculated from the 2D box and the depth data?

Thanks in advance.

fmrico commented 3 years ago

Hi @dnlwbr

Maybe @fgonzalezr1998 could help you here.

We are also moving our efforts to yolact_ros_3d. I can show you some experiments this week with this: Captura de pantalla de 2020-11-13 10-01-35 yolact

Best!!

fgonzalezr1998 commented 3 years ago

Hi @dnlwbr The working frame that you have to specify at config file has follow the following coordinates: x-axis point to the front, y-axis point to the left and z-axis points to the top. So, the frame you are using is not correct. Also, take care of using a pointcloud that be depth_registered.

As @fmrico said, yolact_ros_3d is its improvement. It uses YOLACT instead of YOLO and if applies a fast thinning algorithm to compute the 3D bounding boxes optimally.

dnlwbr commented 3 years ago

Thanks for your quick response. I will try the coordinate frame described by you.

About yolact_ros3d: I would like to try it, but unfortunately I don't use ros2 yet. Is there a ros version of it?

Do you have any experience how accurate the 3d bounding boxes calculated by darknet_ros_3d and yolact_ros3d are compared to those from a NN 3d object detection?

fgonzalezr1998 commented 3 years ago

@dnlwbr yolact_ros_3d bboxes are more accurate than darknet_ros_3d because of the usage of YOLACT instead of YOLO.

We have not compared it with a NN 3d object detection. However, yolact_ros_3d and dartknet_ros_3d provide an adventage: You can get 3d bounding boxes without depend of a singular NN 3d object detection.

Unfurtunately, yolact_ros_3d is not available for ROS1 by the moment. When this package has been finalized, I will develop its version for Melodic/Noetic.

germal commented 3 years ago

Hello @fgonzalezr1998 This project is simply amazing.Any chance that you add a path tracking of an object capability to it ? Thank a lot

fgonzalezr1998 commented 3 years ago

There is a repo (who is already in development) that calculates the octomap of each detected object and put it in a 2d map. In this way you can make a semantic mapping and object tracking.

This package is being developed over ROS2 but when it is finished, I will migrate it to ROS1. If you are interested, you can take a look here @germal

germal commented 3 years ago

Hello @fgonzalezr1998 Thank you for your reply ! I see that in darknet_ros the reference frame for the detection is the camera and the objects detected can be visualized on the map using the markers as trick. Is it the tracking system of yoloact_ros_3d different ? Thanks a lot ! germal

fgonzalezr1998 commented 3 years ago

Hi @germal yolact_ros_3d publishes 3d bounding boxes and marker but also it publishes an octomap that can be overlapped in a map (in map frame). In this way, you can do a semantic mapping, so, you will can say something like "come to the table", "go near of the refrigerator", etc. here you have a simple demo about how octomaps works.

A great part of this work has been done by @fmrico

However, I insist, this work is already in progress. Thank you for your interest!

dnlwbr commented 3 years ago

Hi @fgonzalezr1998 what exactly do you mean with depth_registered? Do you mean the pointcloud has to be organized with the same height/width as the rgb image and every pixel on the image has a corresponding point in the pointcloud with the same index?

Also, does the working frame has to be the local frame of the camera or is it possible to use a global frame like odom or map if a transformation is provided via /tf?

Thank you!