Implementing BEVDet in Autoware

cyn-liu commented 7 months ago

Checklist

[x] I've read the contribution guidelines.
[x] I've searched other issues and no duplicate issues were found.
[x] I've agreed with the maintainers that I can plan this task.

Description

BEVDet is a BEV perception algorithm based on panoramic cameras. It unifies multi-view images into the perspective of BEV for 3D object detection task. It is different from the current 3D perception feature of Autoware. BEVDet code repos

Purpose

Integrating BEVDet into Autoware for 3D object detection based on multi-view images, this task related to Sensing& Perception task.

Possible approaches

BEVDet is a 3D object detection model trained on NuScenes dataset using 6 surround view camera images. The 6 cameras form a 360 degree field of view with overlapping fields of view. When mapping from 2D to 3D, some parameters are required, including camera intrinsic parameters and extrinsic parameters between each camera and ego. Integrating BEVDet into Autoware involves the placement of 6 cameras and calibration. Convert BEVDet model into ONNX format for deployment in Autoware.

Definition of done

[ ] The placement of 6 cameras and calibration
[x] Convert BEVDet model into ONNX format
[x] Deploying BEVDet model on device using TensorRT
[x] BEVDet output result adaptation to Autoware topics

liuXinGangChina commented 7 months ago

Great，maybe you can make a todo task list first and see what others can take part in

cyn-liu commented 6 months ago

We refer to this project and successfully ran it on our own machine. We use RTX3080 GPU and Trt FP16 inference BEVDet-R50-4DLongterm-Depth model. The mAP and inference speed of BEVDet-R50-4DLongterm-Depth TensorRT version can refer this project link. The following is the running results on our machine:

https://github.com/autowarefoundation/autoware/assets/104069308/af71df5b-7776-425e-8720-0d7244847a54

The following is the inference speed on our machine:

https://github.com/autowarefoundation/autoware/assets/104069308/ec14066c-86a3-4a08-accd-b9690cf2d692

Next, we will modify ROS1 node to ROS2 node based on this project, then, we will use TIER IV's dataset for testing, and we hope that this dataset can provide ROS2 bag format.

Our plan of integrate the BEVDet ROS2 node into Autoware:

define a bevdet_node in Autoware perception module
organize the 3D boxes results into autoware_perception_msgs::msg::DetectedObjects type
input the output result of bevdet_node into the object_merger node and fuse it with the detection results of other models

cyn-liu commented 6 months ago

Environment: CUDA11.3.1 cudnn- linux-x86_64-8.8.1.3_cuda11 TensorRT-8.5.1.7.Linux.x86_64-gnu

liuXinGangChina commented 5 months ago

Maybe try with AWSIM data

liuXinGangChina commented 5 months ago

list the cuda env here

cyn-liu commented 5 months ago

Using the BEVDet model to infer the TIER4 dataset, it was found that the model had poor generalization performance on the TIER4 dataset.

Visualization results on TIER4 data：

(1) concat_img

tier4_1_bevdet

(2) concat_img

tier4_2_bevdet

Visualization results on NuScenes data：

concat_img nusbevdet

liuXinGangChina commented 5 months ago

Looks like the original pre-trian（based on nuScenes dataset） model‘s generalization on tire4 dataset is not as well as we expected. Obstacles's direction is almost right but the depth of them ge

we plan to close this task once we have the node tested. And creat a new task of "retrain the model" to see whether the new model’s performance on tire4 dataset increase.

cyn-liu commented 4 months ago

Our plan of integrate the BEVDet ROS2 node into Autoware:

define a bevdet_node in Autoware perception module

organize the 3D boxes results into autoware_perception_msgs::msg::DetectedObjects type

input the output result of bevdet_node into the object_merger node and fuse it with the detection results of other models

Considering that running the BEV 3D detection algorithm based on multi-cameras and the Lidar based 3D detection algorithm simultaneously is too heavy a load. we have decided not to merge the results of BEVDet with the output results of Lidar, but to create a new perception_mode, when perception_mode = camera, launch bevdet_node.

cyn-liu commented 4 months ago

@xmfcx The PR related this issue has been successfully tested in the newer Autoware docker image. The environment information of this image:

CUDA==12.3
libnvinfer==8.6.1.6

Note: Outside in docker, I must upgrade to my nvidia GPU driver version to ensure that this driver supports a maximum CUDA version >= 12.3. nvidia-driver-version

autowarefoundation / autoware