ZettaScaleLabs / rmw_zenoh

RMW for ROS 2 using Zenoh as the middleware
Apache License 2.0
0 stars 0 forks source link

`rmw_zenoh_cpp` starts slowly due to the router connection during initialization #12

Open YuanYuYuan opened 3 weeks ago

YuanYuYuan commented 3 weeks ago

Taking _rcl/test/testgraph as an example,

YuanYuYuan commented 2 weeks ago

By reducing the retry interval from 1s to 100ms, the total time of rmw_zenoh_cpp becomes 15s.

I think the remaining questions are

  1. Is 100ms a proper default value?
  2. rmw_zenoh_cpp usually complains "Unable to connect to a Zenoh router" even though the router has been alive for a while. A quick workaround is adding a short sleep like 100ms before we begin checking any connected routers.
[ RUN      ] NodeGraphMultiNodeFixture.test_node_info_clients
2024-08-29T07:46:06.402460Z  INFO ThreadId(02) zenoh::net::runtime: Using ZID: 3d6a93f9c6e11ecd03c40baae8eac912
2024-08-29T07:46:06.403313Z  INFO ThreadId(02) zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[::1]:35069
2024-08-29T07:46:06.405611Z  WARN ThreadId(02) zenoh::net::runtime::orchestrator: Scouting delay elapsed before start conditions are met.
[INFO] [1724917566.405698472] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id fa7db30090db9561f56a0c63cb77f13f.
2024-08-29T07:46:06.514480Z  INFO ThreadId(02) zenoh::net::runtime: Using ZID: efd3a5891210cbe8f4f266dccac1ed48
2024-08-29T07:46:06.515690Z  INFO ThreadId(02) zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[::1]:39389
2024-08-29T07:46:06.517990Z  WARN ThreadId(02) zenoh::net::runtime::orchestrator: Scouting delay elapsed before start conditions are met.
[INFO] [1724917566.518090345] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id fa7db30090db9561f56a0c63cb77f13f.
2024-08-29T07:46:06.651655Z  INFO ThreadId(02) zenoh::net::runtime: Using ZID: a147e25ec3ac45235be75d4014f9a059
2024-08-29T07:46:06.654292Z  INFO ThreadId(02) zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[::1]:37989
2024-08-29T07:46:06.657196Z  WARN ThreadId(02) zenoh::net::runtime::orchestrator: Scouting delay elapsed before start conditions are met.
[ERROR] [1724917566.657438032] [rmw_zenoh_cpp]: Unable to connect to a Zenoh router. Have you started a router with `ros2 run rmw_zenoh_cpp rmw_zenohd`?
[INFO] [1724917566.757634370] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id fa7db30090db9561f56a0c63cb77f13f.
[ERROR] [1724917567.865398794] [rmw_zenoh_cpp]: topic name /test_service not found in topic_map. Report this.
[ERROR] [1724917567.865425930] [rmw_zenoh_cpp]: topic name /test_service not found in topic_map. Report this.
[WARN] [1724917567.865651105] [rmw_zenoh_cpp]: Received liveliness token to remove unknown node /old_node_name from the graph. Ignoring...
[WARN] [1724917567.865866689] [rmw_zenoh_cpp]: Received liveliness token to remove unknown node /old_node_name from the graph. Ignoring...
2024-08-29T07:46:07.867399Z ERROR rx-2 ThreadId(10) zenoh::net::routing::dispatcher::resource: Face{2, 3d6a93f9c6e11ecd03c40baae8eac912} Resource 1 remapped. Remapping unsupported!
[ERROR] [1724917567.870917627] [rmw_zenoh_cpp]: topic name /rosout not found in topic_map. Report this.
[WARN] [1724917567.871392931] [rmw_zenoh_cpp]: Received liveliness token to remove unknown node /test_graph_node from the graph. Ignoring...
evshary commented 1 week ago

I tried this patch: https://github.com/eclipse-zenoh/zenoh/commit/926208cd2bd9a7ef59e6151ff17a99ce6b78edbd

In the complete test, most of the complain "Unable to connect to a Zenoh router" disappear now. However, it still takes time to run the test.

Here is the log:

rmw_zenoh_cpp(with patch):

[ RUN      ] TestGraphFixture.test_rcl_get_and_destroy_topic_names_and_types                                                                                                 
[INFO] [1725346848.889051199] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 5369938a20e811c424128c784a33b0db.                                            
[INFO] [1725346849.892166778] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 5369938a20e811c424128c784a33b0db.                                            
[WARN] [1725346850.893782847] [rmw_zenoh_cpp]: Received liveliness token to remove unknown node /old_node_name from the graph. Ignoring...                                   
[       OK ] TestGraphFixture.test_rcl_get_and_destroy_topic_names_and_types (2050 ms)                                                                                       
[ RUN      ] TestGraphFixture.test_rcl_get_service_names_and_types
[INFO] [1725346850.936945583] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 5369938a20e811c424128c784a33b0db.
[INFO] [1725346851.939833963] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 5369938a20e811c424128c784a33b0db.
[WARN] [1725346852.941108863] [rmw_zenoh_cpp]: Received liveliness token to remove unknown node /old_node_name from the graph. Ignoring...
[       OK ] TestGraphFixture.test_rcl_get_service_names_and_types (2005 ms)
[ RUN      ] TestGraphFixture.test_rcl_names_and_types_init
[INFO] [1725346852.942666472] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 5369938a20e811c424128c784a33b0db.
[INFO] [1725346853.945273296] [rmw_zenoh_cpp]: Successfully connected to a Zenoh router with id 5369938a20e811c424128c784a33b0db.
[WARN] [1725346854.946429386] [rmw_zenoh_cpp]: Received liveliness token to remove unknown node /old_node_name from the graph. Ignoring...
[       OK ] TestGraphFixture.test_rcl_names_and_types_init (2005 ms)

rmw_cyclonedds_cpp:

[ RUN      ] TestGraphFixture.test_rcl_get_and_destroy_topic_names_and_types                                                                                                 
[       OK ] TestGraphFixture.test_rcl_get_and_destroy_topic_names_and_types (4 ms)                                                                                          
[ RUN      ] TestGraphFixture.test_rcl_get_service_names_and_types                                                                                                           
[       OK ] TestGraphFixture.test_rcl_get_service_names_and_types (4 ms)                                                                                                    
[ RUN      ] TestGraphFixture.test_rcl_names_and_types_init                                                                                                                  
[       OK ] TestGraphFixture.test_rcl_names_and_types_init (5 ms) 

If update the sleep time here https://github.com/ZettaScaleLabs/rmw_zenoh/blob/7dba6746caf17e01deb4301c9955e5631339f40e/rmw_zenoh_cpp/src/rmw_init.cpp#L216

I thought the sleep should only occur if unable to connect to Zenoh router. However, some tests fail if doing so. Still investigating it.

evshary commented 1 week ago

There are 3 issues here:

evshary commented 1 week ago

With this patch https://github.com/eclipse-zenoh/zenoh-c/tree/dev/liveliness_subscriber_history, using liveliness subscriber with history works well and we can move the unnecessary sleep into if-condition. This will solve the remaining two issues.