Alpaca-zip / ultralytics_ros

ROS/ROS 2 package for Ultralytics YOLOv8 real-time object detection and segmentation. https://github.com/ultralytics/ultralytics
GNU Affero General Public License v3.0
203 stars 40 forks source link

Improvement of processing speed #64

Open Alpaca-zip opened 9 months ago

Alpaca-zip commented 9 months ago

Branch

noetic-devel

Description

The two functions that take the most time are projectCloud() and downsampleCloudMsg(), both of which seem to take similar amounts of time. While I can't think of any way to improve the speed of downsampleCloudMsg(), it seems that projectCloud() could benefit from parallel processing with OpenMP.

Additional

Within projectCloud(), the most time consuming processes are processPointsWithBbox() and processPointsWithMask(), with euclideanClusterExtraction() taking about a fifth of the time, which is less than I originally thought.

Are you willing to submit a PR?

Alpaca-zip commented 9 months ago

I've implemented parallel processing using OpenMP in the feature/omp_parallel branch. Testing with the KITTI dataset, the average processing time in syncCallback() improved from 17.5 ms to 14.8 ms, marking an average improvement of 15.5%.

My configuration is as follows:

@h-wata, if you have some time, could you please test it in your environment as well? If you notice a significant improvement and find this change beneficial, I'll consider merging it into the noetic-devel branch. Thank you for your cooperation.

h-wata commented 9 months ago

Thank you for your implementation. When I set voxel_leaf_size:=0.01, it takes 2.0 seconds to calculate projectCloud() in the noetic-devel. By the way, in the feature/omp_parallel branch, it takes between 0.6 to 0.7 seconds. Furthermore, the CPU usage is three times higher than in the noetic-devel when three objects are shown.

image

However, this setting is quite aggressive. With the default setting of 0.1, both branches should have no problem regarding computation time.