Closed Arthav24 closed 2 years ago
Hello Arthav, I'm having a i5-8350U here. Even if I start the driver in a virtual machine, I don't get such a high CPU usage. Is RViz running? If so, how does the system behave when RViz is closed?
If you are only using one layer, you could disable the unused ones. This can be done via dynamic_reconfigure of ROS, or directly via http (example): http://
Hey, No I am not running rviz in normal operation. Just for visualising sometimes I use rviz. Rviz doesn't seems to affect the CPU usage. Initially I did the same switched off the unused planes and also commented static tf publisher in launch file for unused planes that rather increased the usage +10%. Reverting back brought it down to 80s%.
@Arthav24 are you using the binaries or you have build the repo from source? If you are building from source, please build it in debug mode
catkin build -DCMAKE_BUILD_TYPE=Debug
and use valgrind
to report if there are any memory leaks:
launch-prefix="valgrind"
So the node in the launch file would look something like
<node pkg="pf_driver" type="ros_main" name="r2300_node" output="screen" launch-prefix="valgrind --tool=memcheck --leak-check=yes --log-file=memleak.log">
Please report here what is the log output from valgrind
memleak_old_4tf.log memleak_old_1tf.log memleak_latest_4tf.log memleak_latest_1tf.log
Please find the attached logs. memleak_old_4tf.log - tag 1.1.1 and enabled all the tfs in launch file ( ring1/2/3/4 ) memleak_old_1tf.log - tag 1.1.1 and enabled only one of the plane I use.
memleak_latest_4tf.log - main branch and enabled all the tfs in launch file ( ring1/2/3/4 ) memleak_latest_1tf.log - main branch and enable only one of the plane I use.
The logs don't seem to be alarming. I will do a test with
launch-prefix="valgrind --tool=callgrind --callgrind-out-file=callgrind.log"
and then we can compare the logs maybe
hey @ipa-hsd any update from your test ?
@Arthav24 when I compile the latest main
branch with
catkin build -DCMAKE_BUILD_TYPE=Release
the CPU usage fluctuates between 30-60%. Not as high as you reported, but it can be improved maybe.
By compiling with Debug
option and using callgrind
, the log file is callgrind.log. I will need to go through the log in detail, but a quick look seems like most of the time consuming calls are ROS-specific.
How long did you run the driver after which you noticed the high CPU usage? Or it's the case since the beginning?
I just realized you mentioned that you are working with version 1.1.1, but we fixed some issues with the low frequency after the release: https://github.com/PepperlFuchs/pf_lidar_ros_driver/commits/main. Could you try pulling in the latest main
branch and compiling with Release
option?
yeah, but for me the CPU usage was always >60%. When I was using the driver in my stack then I had BUILD_TYPE=Release but had similar results. I had been using driver for very long runtime but the usage was high from the beginning as soon as the node starts.
yes, I did check the latest commits in main branch for fixing frequency and accumulation. and tried but saw similar results. This too was build with Release option.
Can you try with
launch-prefix="valgrind --tool=callgrind --callgrind-out-file=callgrind.log"
and share the log?
I had another look at the callgrind.log
and looks like this while loop is the culprit. It's a fast while loop without any sleep and the loop tries to read data even if there isn't any. For 25Hz data, this is quite a lot.
@Arthav24 could you please try https://github.com/PepperlFuchs/pf_lidar_ros_driver/pull/65? For me the consumption has come down from 30-60% to less than 10%. This is the log file after the fix: callgrind_fix.log. I have marked the PR with [WIP] because there might be some possibility to improve it slightly more.
Sorry for the delayed response, I see you have come up with an amazing solution which I will definitely give a try and report back.
Definitely, I think it could be improved because I have also used R2000 but a different driver and for that, I never saw usage >5%. Thanks to @konu-droid for providing logs and checking things out.
@Arthav24 any updates?
Again sorry for the delayed response, These are the observation from 2 days of testing : 1) CPU usage dropped to 13% with all the layers enabled. 👍 2) CPU usage jumped to 80%( peak of 98%) if we disable 3 layers from both the sensor and launch file. 3) Reverting 2) brought back to 13%. 👎 4) The usage in 2) seems to creep up slowly as initially after launching it was around 13-15% but after sometime it went up to that level. 👎 5) The fix in while loop seems to affect the data frequency as we noticed that at 50Hz mode actual rate was 6Hz and at 100Hz mode rate was 12Hz, just half of expected in both cases. 6) While debugging we tried to cut half the wait time you introduced in the fix to 5 ms that broght back the expected frequencies in both modes.
Once again thanks @konu-droid for running all the tests.
Thanks for the detailed analysis!
While debugging we tried to cut half the wait time you introduced in the fix to 5 ms that broght back the expected frequencies in both modes.
Maybe the sleep was not required in that loop in the 1st place. I will revert it back.
CPU usage jumped to 80%( peak of 98%) if we disable 3 layers from both the sensor and launch file.
That seems strange. But in any case, I should introduce a conditional variable which wakes up the reader thread only when there is data available.
The usage in 2) seems to creep up slowly as initially after launching it was around 13-15% but after sometime it went up to that level.
That sounds more like a memory leak (?)
Exactly this.
std::this_thread::sleep_for(std::chrono::milliseconds(5));
I also thought that sleep is not required but cutting it half brought the frequencies back to expected so didn't try removing it completely. That would be perfect that threads are spawned only for the enabled layers. Yes, it looks and sounds like a memory leak, like something piling up but couldn't debug the same so right now using it with all layers enabled.
- CPU usage jumped to 80%( peak of 98%) if we disable 3 layers from both the sensor and launch file. Reverting 2) brought back to 13%. -1
Below is the section that is most likely culprit of this.
Pushed new changes to https://github.com/PepperlFuchs/pf_lidar_ros_driver/pull/65 which maintains the frequency as expected. Also, the consumption is around 10-15% in case of all layers enabled and 5-8% in case of only one layer enabled. The memory leak is currently fixed by resetting the cloud message when the scan parameters change (e.g. if layers_enabled
is changed before all layers have been published for the current scan).
I am using R2300 with intel i5-6500TE. At the htop I can see pf driver consuming almost 80-90% CPU. Does anyone facing the same because I haven't modified the driver. Please guide me on how to debug the cause. Just launching the r2300.launch increases this much CPU usage. Not even launching the data processing node. High CPU usage causing computational bottlenecks with my usage so looking forward to reducing it to considerable levels. System details: Ubuntu 18.04, ROS melodic, driver v1.1.1
also since I am for now using only one of the planes out of four is there any way to optimize in terms of reducing load and data.