autowarefoundation / autoware.universe

https://autowarefoundation.github.io/autoware.universe/
Apache License 2.0
968 stars 631 forks source link

Evaluate performance of object detection pipeline #565

Closed mitsudome-r closed 1 year ago

mitsudome-r commented 2 years ago

Checklist

Description

Evaluate performance of perception pipeline.

Purpose

Evaluate performance of perception pipeline to confirm if current perception has enough capability for BGus

Possible approaches

Find available open datasets for evaluation and do benchmark testing. Possible datasets:

Definition of done

kaancolak commented 2 years ago

In the Autoware.Auto, we have 2D and 3D detection benchmark tools for the KITTI dataset. We can test our 2D/3D detections by integrating these tools into Autoware.Universe.

In addition, I think, we need to evaluate the entire perception stack of Autoware.Universe from sensor output(Point Cloud/Camera Frame) to object tracking output, because all perception modules work together. Such as; deterministic and deep learning-based 3D bounding boxes are based merged or tracking results are merged with detected objects. For object tracking, the KITTI dataset has only 2D bounding box-based evaluation. For this reason, we are planning to implement Waymo dataset for 3D detection & tracking evaluation.

kaancolak commented 2 years ago

waymo

I shared high-level architecture for the planned perception benchmark pipeline. If you have any comments feel free to share them with us.

Limitations when benchmarking with the Waymo dataset:

Current situation:

Img alt text

There is small jittering I think caused by localization, base_link - global frame transformation.

kaancolak commented 2 years ago

I shared the initial 3D tracking benchmark results in the README file of the PR. It contains only the results of the lidar-only pipeline.

For vehicles, everything works fine, but for pedestrians, we are giving a constant length and width size to the pedestrian bounding boxes in Autoware.Universe, it's equal to 1 meter. But, Waymo Dataset has so strict IoU scores for matching tracked ground truth objects and predictions. When we give a fixed size to pedestrians, it falls below the cutoff score. (Vehicle: 0.7 , Pedestrian and Cyclist: 0.5)

I will write a detailed explanation under the PR. If you have any suggestions or advice please share them.

WJaworskiRobotec commented 2 years ago

@kaancolak I'm wondering if using Waymo Open Dataset Toolkit for evaluation is a good idea. I understand that the best solution would be to make it possible to use the same metric calculation software for Waymo, Kitti ( + other open datasets), as well as with synthetic data generated in the simulators. In this case, as Autoware is based on ROS2, it would be perfect to have metric calculation based on ROS topics directly, optionally on rosbags (like we develop in this issue) What is your opinion ? I'm not really familiar with Waymo Open Dataset toolkit, so I might misunderstand something.

kaancolak commented 2 years ago

@WJaworskiRobotec Thanks for your feedback.

Current benchmarking tool scripts subscribe to ROS2 topics and convert the tracked objects to the proto format desired by the Waymo dataset. If we want to compare our tracking result with other tracking submissions in the Waymo 3D Tracking Challenge, I think Waymo Open Dataset Toolkit is the best way for doing it, it contains a lot of special configurations for metric calculation. I chose the Waymo Dataset for the 3D tracking benchmark because most of the popular dataset doesn't contain a 3D tracking benchmark, like KITTI, Berkeley DeepDrive, and Lyft.

If we want to evaluate our 2D, and 3D detection results with other open datasets and the synthetic data generated in the simulators, real-time evaluation directly over the ROS2 topic could be very useful, just need to implement proper metrics. I can easily extend the functionality of this tool.

WJaworskiRobotec commented 2 years ago

Thanks a lot for the explanation. Pipeline that you created is perfect for comparing Autoware results with others, as well as it will be very easy to just connect metric calculation node that is created in the task I mentioned directly to the same topics that you connect the "data converter" converting to the Waymo format. Look perfect to me and once we have proper metrics we will connect it with your code.

kaancolak commented 2 years ago

@WJaworskiRobotec Instead of working on the different benchmarking pipelines, I think we can extend the metric calculation node. It's a more generic evaluator that contains the entire stack(evaluator, planning, control, detection, etc.) that makes sense. I would like to implement the perception(2D/3D perception) part on your metric calculation nodes, base algorithm will be very similar to this tool but must be in the same code format as your nodes.

WJaworskiRobotec commented 2 years ago

@kaancolak Sounds great. @djargot is currently working on the last node that we wanted to create as an example and it is related to perception (segmentation algorithm evaluation). Once it's done we will assign you as a reviewer, and you can continue with adding your nodes for 2D/3D Object Detection.

kaancolak commented 2 years ago

This PR waiting for review.

xmfcx commented 2 years ago

@kaancolak can you update the current status of this issue?

kaancolak commented 2 years ago

I have made some updates on the code base. Currently, it's waiting for review.

epris commented 4 months ago

Does the perception benchmark tool still work? It seems that there are some problems with the python version