Closed BarzelS closed 3 years ago
I'd recommend profiling it to figure out where the exact time is spent. I'm going to guess it relates to PCL versions since that's been a cause for some users to experience higher CPU loads in the past, but it would be good to know where the issue is to fix it.
Are you comparing with the exact same parameters in both?
stvl_layer/width: 0.5
stvl_layer/height: 0.5
stvl_layer/resolution: 0.15
These aren't in the namespace
I'd recommend profiling it to figure out where the exact time is spent. I'm going to guess it relates to PCL versions since that's been a cause for some users to experience higher CPU loads in the past, but it would be good to know where the issue is to fix it.
Are you comparing with the exact same parameters in both?
stvl_layer/width: 0.5 stvl_layer/height: 0.5 stvl_layer/resolution: 0.15
These aren't in the namespace
Thanks
The way you'd profile any other C++ code, with valgrind / callgrind. It should give you some files that are visualizable where the CPU time is spent and you can compare from ROS1 to ROS2 where that difference in time is coming from.
I don't have a particular PCL version to recommend beyond whatever is shipping with your ROS distribution. In general it's very hard to have multiple next to each other without issues.
The fact that your costmap is only 0.5x0.5 make me really curious how this is working for you. That's a very peculiar configuration.
The way you'd profile any other C++ code, with valgrind / callgrind. It should give you some files that are visualizable where the CPU time is spent and you can compare from ROS1 to ROS2 where that difference in time is coming from.
I don't have a particular PCL version to recommend beyond whatever is shipping with your ROS distribution. In general it's very hard to have multiple next to each other without issues.
The fact that your costmap is only 0.5x0.5 make me really curious how this is working for you. That's a very peculiar configuration.
Thanks @SteveMacenski
If I understand the output correctly it seems that the bottleneck is some DDS usage of converting pointcloud messages, right? BTW, this output was with the RTI's RMW, but I have also tried FASTRTPS which produces even higher CPU usage.
Thanks
I agree from that entry, what about the ones upper to it? How about trying Cyclone just to round off the available options? I don't think that will change anything but just to verify.
I'd also be curious if you messed with the packet sizes in your DDS configuration how much that would help. Pointclouds are heavy to publish even in ROS1. ROS2 on DDS actually makes that a bit worse out of the box without some tweaking of the message fragmentation size.
@EduPonz @JaimeMartin this is definitely an issue, can you give us some feedback on how FastDDS can be configured to make this reasonable? Publishing a pointcloud shouldn't take this kind of time. @SBarzz there's almost no chance we can get any support from RTI on this so I suggest you work with eProsima or Cyclone.
Do you have ROS 2 security enabled on that topic?
Do you have ROS 2 security enabled on that topic?
I'm not sure how I can check if I enabled security on that topic:
Basically I'm just using the intel ros realsense ros wrapper to publish the point cloud to the stvl(nav2):
These are the configurations I have now at the base_realsense_node.cpp:
rclcpp::QoS m_qos(rclcpp::QoSInitialization::from_rmw(rmw_qos_profile_sensor_data));
_pointcloud_publisher = _node.create_publisher
Do you refer the security described here: https://design.ros2.org/articles/ros2_dds_security.html ?
Thanks
If you don't know how, you didn't do it :smile: so no worries. Just asking in case you did, as that would have a particularly powerful impact on CPU load on large topics.
I agree from that entry, what about the ones upper to it? How about trying Cyclone just to round off the available options? I don't think that will change anything but just to verify.
I'd also be curious if you messed with the packet sizes in your DDS configuration how much that would help. Pointclouds are heavy to publish even in ROS1. ROS2 on DDS actually makes that a bit worse out of the box without some tweaking of the message fragmentation size.
@EduPonz @JaimeMartin this is definitely an issue, can you give us some feedback on how FastDDS can be configured to make this reasonable? Publishing a pointcloud shouldn't take this kind of time. @SBarzz there's almost no chance we can get any support from RTI on this so I suggest you work with eProsima or Cyclone.
Hi @SteveMacenski, Thanks for the support!
Hi @SBarzz ,
From your capture I can see that you're using Fast DDS v1.9.3. Are you using Eloquent? Have you tried to reproduced the issue on Foxy or Rolling maybe?
Hi @SBarzz ,
From your capture I can see that you're using Fast DDS v1.9.3. Are you using Eloquent? Have you tried to reproduced the issue on Foxy or Rolling maybe?
It will be problematic for me cause I'm using Jetson TX2 which supports only ubuntu 18.04 at the moment
Try docker, unfortunately Eloquent is EOL so neither Eduardo nor I will be able to reproduce for you in Eloquent. There were a bunch of updates into Foxy that might actually just resolve this, I'm not sure though.
Don't worry about the QoS at this point, as you mention, changing that didn't seem to impact your results. More info can be found here.
I might know what your problem is: There was a configuration change between the current branch heads. For Melodic: voxel_filter: false # default off, apply voxel filter to sensor, recommend on became for Noetic and ROS2: filter: "passthrough" # default passthrough, apply "voxel", "passthrough", or no filter to sensor data, recommend on
So if you use "voxel_filter: true" for ROS2, it will be ignored, and instead no voxel filter will be applied, which leads to very high CPU load (in my case move_base jumps from 80% to 140% with STVL active in the local and global costmaps, on a Core i7 Gen9 laptop, with a single ZED Mini Camera ). Instead you want "filter: "voxel" " on ROS2. I just started using STVL myself.
I might know what your problem is: There was a configuration change between the current branch heads. For Melodic: voxel_filter: false # default off, apply voxel filter to sensor, recommend on became for Noetic and ROS2: filter: "passthrough" # default passthrough, apply "voxel", "passthrough", or no filter to sensor data, recommend on
So if you use "voxel_filter: true" for ROS2, it will be ignored, and instead no voxel filter will be applied, which leads to very high CPU load (in my case move_base jumps from 80% to 140% with STVL active in the local and global costmaps, on a Core i7 Gen9 laptop, with a single ZED Mini Camera ). Instead you want "filter: "voxel" " on ROS2. I just started using STVL myself.
- Is 80% percent load with 2 costmaps a reasonable and expected? I'm worried that if I will add more cameras the CPU load will shoot up. Same if I switch from the laptop to and Nvidia Xavier NX, (although I did not test that setup yet).
- How do I keep the load down? So far it seems to me that aside the disabling STVL on the global costmap (which I obiously don't want to do), the other settings that I've tried (voxel_decay, decay_acceleration, decay_acceleration, voxel_size, publish_voxel_map:false) have minimal impact on the CPU load.
Hi @vanem Thanks for your answer, I'm using the eloquent version of STVL and I think you are talking about the foxy version, right? Are you talking about this part of the code:
if (_voxel_filter) {
pcl::VoxelGrid<pcl::PCLPointCloud2> sor;
sor.setInputCloud(cloud_pcl);
sor.setFilterFieldName("z");
sor.setFilterLimits(_min_obstacle_height, _max_obstacle_height);
sor.setDownsampleAllData(false);
float v_s = static_cast<float>(_voxel_size);
sor.setLeafSize(v_s, v_s, v_s);
sor.setMinimumPointsNumberPerVoxel(static_cast<unsigned int>(_voxel_min_points));
sor.filter(*cloud_filtered);
} else {
pcl::PassThrough<pcl::PCLPointCloud2> pass_through_filter;
pass_through_filter.setInputCloud(cloud_pcl);
pass_through_filter.setKeepOrganized(false);
pass_through_filter.setFilterFieldName("z");
pass_through_filter.setFilterLimits(
_min_obstacle_height, _max_obstacle_height);
pass_through_filter.filter(*cloud_filtered);
}
If you are talking about this so in the eloquent version just by passing the voxel_filter:true it will set the flag "_voxel_filter" to true. But thanks for trying to help Actually I've just figured out that on my laptop the cpu usage is very low comparing to the usage I presented(It was on my Jetson TX2) You are not encountering any cpu problems on your Xavier?
You're aware the Jetson boards have much weaker CPUs than your computer. Are you comparing CPU % on the same CPUs?
You're aware the Jetson boards have much weaker CPUs than your computer. Are you comparing CPU % on the same CPUs?
Yes the comparison between Ros and Ros2 performed both on the same jetson.
How are you installing STVL on Eloquent? Could you try the Foxy branch with the changes in config that @vanem suggests? I think from your profiling we've identified it as probably being a DDS related jump, but @vanem is saying that he's getting things to work fine (what ROS version are you comparing @vanem?) after that change.
If it is indeed DDS related, there's not much I can suggest here specifically. You'll need to dig into your DDS configs to optimize for the larger pointcloud movements. But like both @vanem and I suggest, try moving to Foxy. Substantial DDS improvements have been made to all the Tier 1 DDS vendors and this might be moot because they were solved.
How are you installing STVL on Eloquent? Could you try the Foxy branch with the changes in config that @vanem suggests? I think from your profiling we've identified it as probably being a DDS related jump, but @vanem is saying that he's getting things to work fine (what ROS version are you comparing @vanem?) after that change.
If it is indeed DDS related, there's not much I can suggest here specifically. You'll need to dig into your DDS configs to optimize for the larger pointcloud movements. But like both @vanem and I suggest, try moving to Foxy. Substantial DDS improvements have been made to all the Tier 1 DDS vendors and this might be moot because they were solved.
We can't offer you any support then. Eloquent is EOL.
My understanding of this ticket is that its DDS performance related, which is not in the scope of this project. Improving the RMWs / reporting poor performance to DDS vendors seems like the more appropriate action since this isn't a problem with STVL, but in working with pointclouds in ROS2 DDS
Closing under that understanding, if that's not accurate and there's something in STVL that's significantly heavier, then we can reopen and discuss
Hi,
I am using the ROS2 version of this package. The cpu consumption is more than 160% where in ROS1 with same configuration the CPU usage was about 40%. I'm using the voxel map publishing for collision detection but even when I turn it off still the cpu usage compare to ROS1 is very high ~140%
The sensor is: Realsense D435i
settings:
BTW I have posted multiple questions in ROS answers but there were no answers, thats the reason I'm sending it here