autowarefoundation / autoware.universe

https://autowarefoundation.github.io/autoware.universe/
Apache License 2.0
1.01k stars 650 forks source link

High latency of lidar pointcloud after launching occupancy grid map module #4070

Closed yaukaizhi closed 1 year ago

yaukaizhi commented 1 year ago

Checklist

Description

When visualizing in Rviz, the raw pointcloud from the Lidar will have a huge (5-10 seconds) delay from real time. This only happens after launching perception module. Without perception, localization with the Lidar works fine.

CPU,GPU,Memory usage are all only at 50%, but the Lidar updates very slowly. What could be the cause of this?

Localization cannot work once perception is launched due to the huge delay of the pointcloud updating.

Expected behavior

Real time pointcloud updates

Actual behavior

Very delayed pointcloud updates

Steps to reproduce

Launch autoware.launch.xml with real Lidar

Versions

No response

Possible causes

Not using the concatenated/pointcloud for perception, using /points_raw instead

Additional context

No response

maxime-clem commented 1 year ago

Please clarify the problem (or maybe post a video). I am not sure if the issue is a delay (rviz always shows the pointcloud from 5s earlier), or a slow update (rviz only updates the pointcloud every 5s).

First, we should make sure when does this issue with the pointcloud topic starts occurs (when published by the lidar driver, when published by a pointcloud filter node, ...). You can check the rate of a pointcloud topic with ros2 topic hz. You can list all pointcloud topics with ros2 topic list -t | grep PointCloud.

Can you also please provide more details about your environment ?

yaukaizhi commented 1 year ago

These are some error msgs that are published: image

Here, I cover with my hand the LIDAR and it takes ~5-10 seconds for it to update on Rviz. https://github.com/autowarefoundation/autoware.universe/assets/112715209/97baf1bd-4a6a-4a7a-8e4e-805ede1ca618

ros topic hz results: before launching perception: /points_raw = 10Hz after launching perception: /points_raw = 10Hz after launching perception: /perception/obstacle_segmentation/pointcloud = 1 Hz

This only occurs after launching perception. Localization module does not cause this. Let me know if there are anymore environment details that would help you :smile:

maxime-clem commented 1 year ago

It looks like the issue comes from the time difference between the base_link -> map transform and the pointcloud message. Maybe localization is only pusblishing the transform every ~10 seconds. Can you check that ?

maxime-clem commented 1 year ago

Sorry. My last comment does not make sense. I wonder if the timestamp of your pointclouds are correct or if they are delayed.

yaukaizhi commented 1 year ago

Is there a good way to check if the timestamps of my pointclouds are correct? Would localization work if they were incorrect?

maxime-clem commented 1 year ago

You can try this command:

ros2 topic delay -h
usage: ros2 topic delay [-h] [--window WINDOW] [--spin-time SPIN_TIME] [-s] topic

Display delay of topic from timestamp in header

positional arguments:
  topic                 Topic name to calculate the delay for

options:
  -h, --help            show this help message and exit
  --window WINDOW, -w WINDOW
                        window size, in # of messages, for calculating rate, string
                        to (default: 10000)
  --spin-time SPIN_TIME
                        Spin time in seconds to wait for discovery (only applies
                        when not using an already running daemon)
  -s, --use-sim-time    Enable ROS simulation time
yaukaizhi commented 1 year ago

After running ros2 topic delay /points_raw, it seems that you're right! There's a roughly 10s delay for the pointcloud timestamp. Would you happen to know what would cause this? Or even better yet, how to go about resolving this?

image

yaukaizhi commented 1 year ago

I've narrowed the source of this issue down to the detection package, it occurs when running: (find-pkg-share tier4_perception_launch)/launch/object_recognition/prediction/prediction.launch.xml">

maxime-clem commented 1 year ago

Do you mean there is no delay if you do not run the perception module ?

yaukaizhi commented 1 year ago

yes that is correct, the delay only occurs when running perception module

maxime-clem commented 1 year ago

It may be a communication delay. Can you configure your LIDAR to reduce the message size or reduce the frequency ? You can then check if this reduces the delay when the perception is running. If yes, this a bandwidth problem.

yaukaizhi commented 1 year ago

I have reduced the Lidar's frequency to half its original (from 10Hz to 5Hz) and the problem still persists, somehow the delay on rostopic delay will always stabilizes at 10 seconds (never less, never more). Could this mean that it's some sort of parameter inside the perception module that is causing this?

yaukaizhi commented 1 year ago

Sorry I've made a mistake, the issue occurs when running include file="$(find-pkg-share tier4_perception_launch)/launch/occupancy_grid_map/probabilistic_occupancy_grid_map.launch.xml and not the prediction package mentioned earlier

yaukaizhi commented 1 year ago

The issue is solved by skipping publishing NaN values from the LIDAR Driver, not sure if this is a good method to solve this. I'll get the warning message: image

maxime-clem commented 1 year ago

Is Autoware able to use the LIDAR data still ? The error messages make it sound like the LIDAR input gets discarded.

yaukaizhi commented 1 year ago

yes most of the LIDAR inputs still get through, I assume it just discards the empty NaN messages sent by the driver.

However, now the 10s latency comes back when I use the crop_box_filter and set it to negative param. 😅

maxime-clem commented 1 year ago

Sorry I could not be of any help. Since I cannot reproduce the issue it is hard to investigate it.

yaukaizhi commented 1 year ago

It's alright! I managed to avoid this issue altogether by using euclidean clustering instead! Thanks for all the hard work!