autowarefoundation / autoware.universe

https://autowarefoundation.github.io/autoware.universe/
Apache License 2.0
1.01k stars 652 forks source link

Statistics of latest Autoware's Whole Perception Pipeline Latency (Lidar only mode) #8752

Open cyn-liu opened 2 months ago

cyn-liu commented 2 months ago

Checklist

Description

Counting the running time of the perception module(lidar-only mode) during real vehicle operation.

Environment: Device: DELL precison3581 OS: Ubuntu22.04 GPU: NVIDIA RTX A1000 6GB Laptop CPU: intel CORE i7

Purpose

In order to get a detailed of each node and the overall time consumption of the perception pipeline, and to facilitate developers in improving the performance of the perception module.

Possible approaches

We will investigate a reasonable method to calculate the running time of the perception pipeline. @mitsudome-r @xmfcx Do you have any suggestions for statistical tools and methods?

Definition of done

TODO

xmfcx commented 2 months ago

Pipeline timings (Recommended)

Easily learn about pipeline latency up-to a node you like by subscribing to node_name/pipeline_latency_ms topic.

CARET

https://github.com/orgs/autowarefoundation/discussions/2822

Takes more time to prepare, requires post processing to analyze timings.

End to end latency for whole system

For checking the latency of whole system.

An object spawns at a predetermined time and the time when the velocity changes is measured.

@cyn-liu I updated this post.

cyn-liu commented 1 month ago

Node Relationship Graph of Autoware's Lidar Mode Pperception Pipeline

perception_node_graph


We have calculated the average running time of each node in the lower half of the node relationship graph above, as shown in the following: perception_node_running_time


We have calculated the average running time of each node in the upper half of the node relationship graph above, as shown in the following: 2-perception_node_running_time

cyn-liu commented 1 month ago

We use the Pipeline timings statistical method recommended by xmfc above to perform latency time statistics on Autoware's perception pipeline(only lidar mode).

Autoware perception module publish the following pipeline_latency_time debug information, we subscribe to these topics and analyze numerical changes through rqt to evaluate the time consumption of the perception pipeline.

/perception/obstacle_segmentation/crop_box_filter/debug/pipeline_latency_ms
/perception/occupancy_grid_map/raw_pc_downsample_filter/debug/pipeline_latency_ms
/perception/occupancy_grid_map/obstacle_pc_downsample_filter/debug/pipeline_latency_ms
/perception/occupancy_grid_map/pointcloud_based_occupancy_grid_map/debug/pipeline_latency_ms
/perception/object_recognition/detection/clustering/voxel_grid_based_euclidean_cluster/debug/pipeline_latency_ms
/perception/object_recognition/detection/centerpoint/lidar_centerpoint/debug/pipeline_latency_ms
/perception/object_recognition/detection/obstacle_pointcloud_based_validator/debug/pipeline_latency_ms
/perception/object_recognition/detection/object_lanelet_filter/debug/pipeline_latency_ms
/perception/object_recognition/tracking/multi_object_tracker/debug/pipeline_latency_ms
cyn-liu commented 3 weeks ago

We found that the multi_object_tracker node is very time-consuming, so we analyzed the various modules inside the node, a lot of debug information has been published within the node to help us analyze the time consumption.

  1. /perception/object_recognition/tracking/multi_object_tracker/debug/input_latency_ms Indicate the time delay from receiving data to starting data processing.

  2. /perception/object_recognition/tracking/multi_object_tracker/debug/processing_latency_ms Indicate the time delay from data processing to data publishing.

  3. /perception/object_recognition/tracking/multi_object_tracker/debug/processing_time_ms The time from the start of publishing to the completion of publishing + the time from the start of processing to the end of processing. But I don't know why the value of this topic is equal to 0.

  4. /perception/object_recognition/tracking/multi_object_tracker/debug/meas_to_tracked_object_ms Indicate the time delay from receiving data to triggering the timer (before data publishing).

  5. /perception/object_recognition/tracking/multi_object_tracker/debug/pipeline_latency_ms Indicate the time delay from receiving data to publishing.

  6. /perception/object_recognition/tracking/multi_object_tracker/debug/cyclic_time_ms Indicate the time interval between two pubs.

Figure_1 Figure_2 Figure_3

From the above figure, it can be seen that /perception/object_recognition/tracking/multi_object_tracker/debug/pipeline_latency_ms takes approximately 125 ms, and it can be roughly assumed that pipeline_latency_ms = input_latency_ms + processing_latency_ms, in which, the time consumption of processing_latency_ms is relatively stable, about 75 ms, while the time consumption of input_latency_ms fluctuates greatly, usually around 50 ms.

https://github.com/user-attachments/assets/d3fffa90-ec82-42cf-8df6-8b8b1076c2c8

cyn-liu commented 2 weeks ago

After analysis, the reason for the long time consumption of node multi_object_tracker has been identified: From the above analysis, it can be seen that /perception/object_recognition/tracking/multi_object_tracker/debug/pipeline_latency_ms takes approximately 125ms, which is mainly composed of input_latency_ms and processing_latency_ms.

  1. For input_latency_ms, it refers to the time consumption of data preprocessing, calculated as follows:
    double input_latency_ms = (now - last_input_stamp_).seconds() * 1e3;

    now refers to the time after data preprocessing. last_input_stamp_ refers to the timestamp of the earliest received data in the data queue.

After analysis, it was found that the data preprocessing function is not time-consuming, so the reason why input_latency_ms takes a lot of time is that the data in the data queue is too old.

  1. For processing_latency_ms , it refers to the time taken between starting the objects tracking algorithm and publishing the results, calculated as follows:
    double processing_latency_ms = ((now - stamp_process_start_).seconds()) * 1e3;

    now refers to the time after publishing the results. stamp_process_start_ refers to the time to start executing the tracking algorithm.

After analysis, it was found that the time consumption of each function in the tracking algorithm is very small. The main source of time consumption for processing_latency_ms is that the publishing thread is triggered by a timer, while the tracking algorithm thread is triggered by data. The two are not synchronized, resulting in a high time consumption.

cyn-liu commented 1 week ago

pipeline_latency_ms represents the time it takes for the entire pipeline from point cloud publishing to the completion of execution at the current node.

/perception/object_recognition/tracking/multi_object_tracker/debug/pipeline_latency_ms refers to the time from the earliest data in the input data queue to the completion of node execution.

From the above statistical results, it was found that node multi_object_tracker's /perception/object_recognition/tracking/multi_object_tracker/debug/pipeline_latency_ms is very high and fluctuates greatly. After analysis, it was found that this is because the data received by node multi_object_tracker is placed in a queue. When there are some frame drops in the bag, the fluctuation of node multi_object_tracker will be significant.

When replacing a bag without frame loss for testing, it was found that the fluctuation of /perception/object_recognition/tracking/multi_object_tracker/debug/pipeline_latency_ms was very small, When there were two data in the input data queue of node multi_object_tracker, /perception/object_recognition/tracking/multi_object_tracker/debug/pipeline_latency_ms was about 200ms. When there was only one data in the input data queue of node multi_object_tracker, /perception/object_recognition/tracking/multi_object_tracker/debug/pipeline_latency_ms was about 100ms. Therefore, the entire perception pipeline took about 100ms from point cloud publishing to tracking completion.

tracking_pipeline_plot

mojomex commented 1 day ago

@cyn-liu Thank you for the in-depth analysis! :sparkles:

I cannot speak to the latency of the later stages in the pipeline, but we can soon get rid of CropBoxFilter completely as this is being implemented more efficiently in the LiDAR driver (Nebula):

CropBoxFilter will remain as a legacy option, as not all LiDAR vendors are supported by the new feature yet.