Closed YoshuaNava closed 4 years ago
it doesn't come as a big surprise as that code was originally intended for my personal research projects. The ROS layer was never really optimized, just patched through time for different needs.
This open another topic: I discontinued the support for that code since a while.
We move the connection ROS <-> libpointmatcher to a separate repo: https://github.com/norlab-ulaval/libpointmatcher_ros
But, when looking around, there are couple of those repo around...
Hi @pomerlef,
Thank you for your very fast response :slightly_smiling_face:
I understand that it was used for your personal projects. I report it to motivate optimizing it and also as part of the tests I mentioned I would be doing mid-year.
Would you like me to move this issue to the new repo? I was about to write a short proposals for optimizing this repo.
On a more general note: I'm extracting similar information from other filters. Would it be helpful if I open similar issues describing the performance of each?
If you don't mind, I would prefer to carry on developpement over there. I'll give the proper access.
Of course! Every bits of data and analysis are super useful! It's already good to know that the binder is a huge bottleneck.
Will do! Thank you.
Hi, As part of my efforts to benchmark libpointmatcher, I ran a ROS node that employs libpointmatcher_ros/point_cloud to serialize and deserialize point cloud data. I implemented a ROS node that receives a point cloud message, deserializes it, and applies a few filters, to finally publish the resulting point cloud, run for 100+ seconds.
I found head-first that the most expensive method called in my program (even more than a surface normal data points filter run every iteration) was
rosMsgToPointMatcherCloud(sensor_msgs::PointCloud2, bool)
from point_cloud.cpp.I used Intel VTune community edition for finding hotspots and Intel Advisor for vectorization advice. In the following lines I describe my search for hotspots and a short analysis.
Hotspots
CPU
Memory access
Memory writing
Vectorization advice
Analysis
I found 3 main CPU-time hostpots:
In terms of memory access, number 1 from the above list is also a strong hotspot. When it comes to memory writing, all paged memory is cleared by the function, and the allocations are neither big or too many (comparing to other methods, e.g. ROS TCP)
Intel Advisor recommends optimizing the "RGB loop" first of all, the cuatri-loop described in point 1 of the CPU hotspots, as well as a loop in libnabo.