Performance using UCLIENT_PROFILE_MULTITHREAD

ted-miller commented 2 years ago

Micro-XRCE-DDS-Client: v2.1.0 Platform: Yaskawa Motoman YRC1000

I am running into an issue with performance when using multiple threads to publish concurrently. Have you guys done any performance benchmarks comparing single-threaded and multi-threaded communication?

I've got one task that is running an executor. It has a timer to publish a bunch of topics (joint_states, tf, etc), and also contains an action server (FollowJointTrajectory). With only that task is publishing, the executor works great. The timer triggers the publishers at a nice consistent interval of 40 ms.

However, when I receive an action goal, I start another task to publish the action-feedback messages. When this task sends a feedback message at the same time as one of the other publishers, the tasks "lock up" for about 1 second. I have traced it down to the uxr_lock function in multithread.c.

I am using a recursive mutex semaphore for the lock. I can see that one task is sitting inside uxr_lock until the other task calls uxr_unlock. So, it appears that the lock is working to protect data from multiple threads (as intended), but it's causing my publish rates to plummet.

So, I'm wondering if this is the expected behavior. Is there something I can do to improve upon this? Or do I need to rethink my approach to get all communication onto the same thread? (Or implement my own lock to prevent concurrent calls to publish. ) If you've got some benchmark data from your own testing, I'd be interested to see it.

Thanks!

pablogs9 commented 2 years ago

Hello @ted-miller, this second seems to be related to the time that the micro-ROS RWM waits for confirmation from the agent when it publishes a topic.

In summary, when the micro-ROS RMW publishes a reliable topic it waits for RMW_UXRCE_PUBLISH_RELIABLE_TIMEOUT milliseconds to retrieve the ACKNACK from the micro-ROS Agent. We can see the action feedback publishing in the same way. This RMW_UXRCE_PUBLISH_RELIABLE_TIMEOUT is 1000 by default.

So this can be:

Some kind of interlock
A lack of confirmation from the agent side

Some ideas:

If you compile micro-ROS with RMW_UXRCE_PUBLISH_RELIABLE_TIMEOUT set to 500 does the lock time is reduced?
You also can use the Timing APIs for setting this timeout at runtime.
What happens if you set the publisher to best effort?
Which transport are you using?
Can you share your micro-ROS application so we can reproduce the issue?

ted-miller commented 2 years ago

Hi @pablogs9. Thanks for the reply.

This second seems to be related to the time that the micro-ROS RWM waits for confirmation from the agent when it publishes a topic. If you compile micro-ROS with RMW_UXRCE_PUBLISH_RELIABLE_TIMEOUT set to 500 does the lock time is reduced?

Yep. You're right about that.

Which transport are you using?

UDP

What happens if you set the publisher to best effort?

The delay goes away entirely. It also dramatically improved the amount of time it takes for me to publish all the topics. (Which makes sense if it's not waiting for confirmation.) I'm really liking this performance gain. Now I'm wondering if I can combine the best-effort policy with a TCP transport for reliability. I want to see if that can give me a good balance.

While changing my publishers over to best-effort, I didn't find a way to adjust the qos settings for the action-feedback-publisher. I see that the only qos setting is for the allocator. Is there a way to update update qos settings after initialization? Or would I need to make a custom fork of rcl to add a new rclc_action_server_init function?

Can you share your micro-ROS application so we can reproduce the issue?

Unfortunately my application is very heavily tied to specific hardware. I haven't taken the time to make it runnable on a PC.

But, given some previous issues that I had, I'm pretty sure that it occurs any time you have multiple threads publishing at the same time. So, I should be able to create a small test app to reproduce.

gavanderhoorn commented 2 years ago

I would be interested to know why the Agent takes so long to ack the reliable transmission (edit: in this particular case).

My assumption would be that should not be necessary, and perhaps points to something we're doing incorrectly, or something we've configured incorrectly.

pablogs9 commented 2 years ago

I'll wait for a reproducible use case.

gavanderhoorn commented 2 years ago

I was not implying this was a task for you @pablogs9. I just made a 'mental note' -- but as a comment.

ted-miller commented 2 years ago

I'm not entirely confident that this code is reproducing the issue exactly. But, the symptom is similar enough that I'm hoping to blame the difference on the platform differences. (The 'difference' being the frequency of occurrence and the amount of time that the thread is blocked.)

I modified the complex_msg_publisher application from micro-ROS-demos/rclc. The main function creates a node and then starts a thread. That thread will create 3 publishers that publish in an infinite loop, with 10 ms between publishes. The main thread waits 6 seconds and then creates a single publisher with 20 ms between publishes.

(BTW: Semi-off-topic... If you attempt to create publishers at the same time on different threads, they will fail. This is the reason for the 6 second delay. The number 6 was an arbitrary choice; I did not measure.)

Before and after each publish, I broadcast a UDP text message. I am measuring execution time by monitoring the broadcast messages.

Open the wireshark capture
Add a display column for data.text
Set filter to data contains "Thread 1"
Change Time Display format to Seconds since previously displayed packet
Scroll through the list looking for abnormally large numbers
Repeat with filter data contains "Thread 2"

The attached capture was started at an arbitrary point after starting the demo application. The first anomaly occurs between packet 56 - 124. On "Thread 1", the publish routine takes nearly half a second.

multithread publisher.zip

pablogs9 commented 2 years ago

Hello, regarding the multithread entity creation: it is not safe to create entities concurrently as stated in the API documentation.

@Acuadros95 can you reproduce this issue?

Acuadros95 commented 2 years ago

I could not replicate this, had to modify a few things to compile the code though: multithread.publisher.zip (Updated)

Create publishers on main thread, then start publish loops.
Use internal uxr_millis() and printf() to debug publish elapsed time instead of UDP debug mesages.
You can also check the publish rate with ros2 commands: ros2 topic hz This will give you the max/min elapsed time between each msg on a topic:
```
root@mydevice:/uros_ws# ros2 topic hz /complex_message_1_1
average rate: 85.340
    min: 0.006s max: 0.020s std dev: 0.00068s window: 2839
```
Had to modify some flags on colcon.meta to compile the workspace, but nothing relevant.

Are you using the released v2.1.0 client commit? If thats the case you are missing relevant updates:

Fix for a bug with the message confirmation timeouts: https://github.com/eProsima/Micro-XRCE-DDS-Client/pull/299
Fix interlock issues: https://github.com/eProsima/Micro-XRCE-DDS-Client/pull/303

Both commits could help to solve your problem, please try again with the latest commit.

If the problem persist, share the agent output on debug mode (Launch with flag -v6) so we can discard problems on the Agent side.

gavanderhoorn commented 2 years ago

thanks @Acuadros95, we can try with post v2.1.0.

btw:

You can also check the publish rate with ros2 commands: ros2 topic hz

ros2 topic hz is really inconsistent in the rates it reports.

So much so that we have multiple Q&As on ROS Answers about this, and people writing a simple node to subscribe and calculate the message rate themselves.

gavanderhoorn commented 2 years ago

Just to clarify: you're basically suggesting to test with develop, correct?

Acuadros95 commented 2 years ago

Didn't know about that

ros2 topic hz is really inconsistent in the rates it reports.

And yes, you can use the current develop branch

Just to clarify: you're basically suggesting to test with develop, correct?

ted-miller commented 2 years ago

Both commits could help to solve your problem, please try again with the latest commit.

The initial testing looks good. My small test app (using my hardware) is working great with these update.

If the problem persist, share the agent output on debug mode (Launch with flag -v6) so we can discard problems on the Agent side.

If I run into any problems, I'll be sure to get some verbose output.

Thanks for the quick support!

gavanderhoorn commented 2 years ago

Just to get it clear: @ted-miller: what was the solution in the end?

Did you change everything to best-effort, or did the changes on the develop branch (with regular guaranteed QoS) solve the problems you were seeing?

pablogs9 commented 2 years ago

FYI: We will release v2.1.1 incorporating those fixes to the stable version in the next weeks.

ted-miller commented 2 years ago

I used the changes on the develop branch with default qos to resolve the issue.

I will certainly use some best-effort publishers going forward. But, they were not used for this test.

eProsima / Micro-XRCE-DDS-Client

Performance using UCLIENT_PROFILE_MULTITHREAD #311