ZettaScaleLabs / rmw_zenoh

RMW for ROS 2 using Zenoh as the middleware
Apache License 2.0
0 stars 0 forks source link

nav2 test analysis #38

Open evshary opened 1 week ago

evshary commented 1 week ago
Package # Failing tests - Rolling # Failing tests -dev/1.0.0 ID of Zenoh falling tests
costmap_queue 0 0 () - ()
dwb_core 1 0 (6) - (0)
dwb_critics 0 0 () - ()
dwb_plugins 0 0 () - ()
nav_2d_utils 0 0 () - ()
nav2_amcl 0 0 () - ()
nav2_behavior_tree 4 11 (7,17,18,28) - (17,18,27,34,38,39,40,44,45,46,47)
nav2_behaviors 0 0 () - ()
nav2_bringup 0 0 () - ()
nav2_bt_navigator 0 0 () - ()
nav2_collision_monitor 1 3 (7) - (7,11,12)
nav2_constrained_smoother 0 0 () - ()
nav2_controller 3 1 (6,7,8)-(6)
nav2_core 0 0 () - ()
nav2__costmap_2d 14 14 (8,9,10,15,17,19,20,21,22,23,24,25,26,28)-(8,9,10,15,17,18,19,20,21,22,23,24,25,26)
nav2_graceful_controller 1 0 (6) - ()
nav2_lifecycle_manager 0 1 () - (10)
nav2_loopback_sim 0 0 () - ()
nav2_map_server 5 4 (8,9,10,11,12) - (8,9,10,11)
nav2_mppi_controller 12 12 (6,7,8,9,10,11,12,13,14,15,16,17)- (6,7,8,9,10,11,12,13,14,15,16,17)
nav2_navfn_planner 1 1 (7) - (7)
nav2_planner 1 1 (7) - (7)
nav2_regulated_pure_pursuit_controller 1 1 (6) - (6)
nav2_rotation_shim_controller 1 1 (6) - (6)
nav2_rviz_plugins 0 0 () - ()
nav2_simple_commander 0 0 () - ()
nav2_smac_planner 11 11 (10,11,12,13,14,15,16,17,18,19,20)-(10,11,12,13,14,15,16,17,18,19,20)
nav2_smoother 2 2 (7,8)-(7,8)
nav2_system_tests 13 16 (12,14,15,17,18,19,20,21,22,23,29,30,31)-(9,12,14,15,18,19,20,21,22,23,29,30,31)
nav2_theta_star_planner 1 1 (1) - (1)
nav2_util 6 5 (7,10,13,15,16,17)-(7,10,13,15,17)
nav2_velocity_smoother 1 1 (6) - (6)
nav2_voxel_grid 0 0 () - ()
nav2_waypoint_follower 2 2 (7,8) - (7,8)
opennav_docking 7 7 (9,10,11,12,13,14,16) -(9,10,11,12,13,14,16)
opennav_docking_bt 0 2 () - (7,8)
opennav_docking_core 0 0 () - ()
alireza-moayyedi commented 1 week ago

Hello @evshary , this is very helpful! May I ask what tests you are referring to?

I was trying to make a remote connection to my robot over wifi. Similarly, I also tried both ends of the 1.0.0 PR (https://github.com/ros2/rmw_zenoh/pull/276) i.e., Rolling and dev/1.0.0. However, both of them are unstable. With your results I can see clearly what's going wrong.

If I skip the navigation i.e., basically running the ROS2 control for the motors and running the lidar along with its filters everything works fine. I can very smoothly visualize the robot remotely. It also properly updates the odometry if I move it with a joystick. But as soon as I try to run nav2 stack, things go wrong.

evshary commented 6 days ago

Hi @alireza-moayyedi In fact, the table is just to show the unit test result in the navigation2 repository On our side, navigation2 works well although not passing all the tests. Perhaps you could describe more about how you run and what the issues you face. BTW, you might need to ensure the version of nav2 you're using includes the fix here. https://github.com/ros-navigation/navigation2/pull/4725

alireza-moayyedi commented 6 days ago

Hi @evshary,

Well that's a surprise to be honest. This is the exact usecase that I am trying work out:

On the robot side:

On a separate computer:

Expected behavior:

Actual behavior in rolling:

Actual behavior in dev/1.0.0:

I am certain that this is an rmw issue because if I connect the separate computer directly with an ethernet cable to the robot and use CycloneDDS with a explicit peers address list and explicit network interface then everything works very smoothly and I can easily initialize and control the robot remotely. Of course the downside then is that I have to follow the robot with my laptop in the hand.

Regarding the release, I am using the latest apt release:

Package: ros-jazzy-nav2-bringup
Version: 1.3.2-1noble.20241015.123150
evshary commented 2 days ago

Hi @alireza-moayyedi

Thank you for the detailed steps. I didn't see anything weird. I would suggest doing some experiments (with rmw_zenoh) to narrow the issue down.

  1. Running the nav2 with simulation on the same host.
    • I believe this should run without any issues.
  2. Using Ethernet to connect your robot and computer (Just as you did with CycloneDDS).
    • See whether the issue comes from WiFi or not.

For the dev/1.0.0 version, perhaps you could share the logs with us. I think the fix I mentioned before hasn't been included in the apt binary, but it's more related to the Rviz plugin crash, which is not the same as your description.

alireza-moayyedi commented 1 day ago

Hi @evshary,

As suggested I tried to narrow it down furthur and here are my findings (everything run with dev/1.0.0):

  1. Running everything (nav2 + rviz) on the same host:
    • This works fine, whether simulation environment or the real hardware. It works as expected (as long as there is no remote connection interfering)
  2. Using direct Ethernet connection:
    • This also works fine, similar to CycloneDDS with Ethernet

So I guess at this point we can conclude something is going wrong with communicating over wifi. Therefore, I tried to dig deeper. First, to omit the possibility of a faulty office wifi, I set up a separate router (2.4 GHz) where only my computer and the robot connected to it. But still the same issues as I reported originally. Here are some logs that might be relevant:

Next, I connected a display to the robot and I tried to see if I could run rviz simultaneously on both the robot as well as the remote computer and check if there was some difference in the behavior. On the robot I managed to get the map loading in the robot's rviz while the remote computer was still not loading it (though not so easily as I will explain later why). Surprisingly, I noticed that after giving the initial pose in the robot's rviz, amcl started to work properly and in the remote rviz I could also see the topics such as costmaps in the map frame (still no map). I drove around a bit and it seemed stable. Here is the remote rviz showing some topics in the map frame after initializing the localization in the robot's rviz: Screenshot from 2024-11-13 10-56-05

So then I got more suspicious on the map server and started digging deeper into it. Now as I mentioned earlier, it was difficult to get the map showing in the robot's rviz when I was trying to also visualize it simultaneously in the remote's rviz. I noticed some irregular behavior when I tried to run the rviz first on the remote computer and then run the nav2 stack on the robot. For some reason, it caused the map server not to load properly: Screenshot from 2024-11-13 11-05-48 Which kind of explained why I had to restart the launches so many times to get the simultaneous rviz loads working. Apparently the order of launching things (rviz remote -> rviz robot -> nav2 robot) was affecting the behavior.

So now in order to make it work, I need to first run nav2 on the robot, initialize the localization on the robot's rviz and only then run the rviz on the remote.

This got me thinking if the /map topic needs some furthur tuning in the zenoh router's configuration to accomodate for the topic's bandwidth. Or maybe this is actually related to the rviz plugin that you mentioned which in that case I should test building nav2 from the source including that fix.

Sorry for the long posts, and I appreciate much your patience. Unfortunately I have not yet found anyone around me who has successfully managed to setup the Zenoh rmw in combination with nav2 for establishing a remote connection. Therefore, I have decided to dig deeper into it myself and report it directly to you here.

evshary commented 16 hours ago

Hi @alireza-moayyedi Thank you for the detailed description. It helps a lot. I will investigate it. Feel free to share with us if there is anything else you find.

JEnoch commented 15 hours ago

@alireza-moayyedi you can try to tune the /map topic when using dev/1.0.0 branch via the downsampling configuration. See here a guideline: https://github.com/ZettaScaleLabs/roscon2024_workshop/blob/main/exercises/ex-7.md

If you don't know the topic type name and hash, you can replace each with * characters in the key_expr. e.g.: key_expr: "0/map/*/*" (assuming ROS_DOMAIN_ID=0 and no namespace is set).