Improve / work-around 200 pt maximum trajectory length limit

gavanderhoorn commented 1 year ago

As far as we can tell, we currently have a maximum length of 200 pts per FollowJointTrajectory goal due to a limitation of micro-ROS (docs).

The queue_traj_point service (docs) was added as a work-around, but it would still be nice to support FollowJointTrajectory goals longer than 200 points, as it reduces the requirements on users (no need to run any additional nodes like #66) and would make MotoROS2 better usable for more complex / real-world use-cases.

gavanderhoorn commented 1 year ago

micro-ROS supports something called continuous serialization, which allows publishers on the Client side to exceed the default 64kB limit on maximum message size.

This would seem to be something we could use, if it were available for subscribers / action servers as well -- which it isn't.

Related posts / issues:

gavanderhoorn commented 1 year ago

Another option could be to migrate from rmw_microxrcedds (and the micro-ROS Agent) to a full DDS based RMW.

This could quite complex, as we'd need to be able to build that RMW for M+ and be able to deploy it on our supported controller(s).

We should be able to reuse the RCL and RCLC layers of the MotoROS2 stack, but even with a full DDS RMW there would still be limits to the maximum length of trajectories in action goals: the limit imposed by M+ on maximum memory usage. Especially for older generation controllers this could be quite severe.

marip8 commented 8 months ago

What is the failure mode when the driver receives a trajectory of over 200 points? I think I'm running into this issue now, and my robot just won't move after receiving the action and the driver seems to freeze up partially. It still publishes joint states and robot status, but it does not respond to any service or action requests. It also doesn't report any errors on the terminal. I end up having to restart the microROS node/container to resolve the issue. When I run a noticeably shorter trajectory (in terms of overall distance), everything seems to work normally. I haven't yet checked the size of the trajectories that I'm sending the robot, but I will implement that soon

ted-miller commented 8 months ago

I think you're running into that issue.

A message that exceeds a certain size will just disappear into the void. Then if an action goal is not properly received, I've seen the action server lock up and be unresponsive. Though, i would expect other services to continue working.

jmarsik commented 8 months ago

Another option could be to migrate from rmw_microxrcedds (and the micro-ROS Agent) to a full DDS based RMW.

This could quite complex, as we'd need to be able to build that RMW for M+ and be able to deploy it on our supported controller(s).

We should be able to reuse the RCL and RCLC layers of the MotoROS2 stack, but even with a full DDS RMW there would still be limits to the maximum length of trajectories in action goals: the limit imposed by M+ on maximum memory usage. Especially for older generation controllers this could be quite severe.

One thing that came to my mind while researching topics related to MotoROS(2), microROS agent and the trajectory size limitation was zenoh ... you definitely know it, I suppose. There is a middleware implementation for ROS2 using Zenoh communication and it has some interesting features in comparison with DDS. There is also zenoh-pico for low performance hardware, but that's just the Zenoh protocol, not the middleware layer for ROS2. Would porting that RMW instead of full DDS RMW be somehow useful? Probably there will still be a limit without some sort of continuous sending of the trajectory parts, right?

gavanderhoorn commented 8 months ago

Yes, we are aware and have looked into what it would take to build and use Zenoh.

I've looked at rmw_zenoh_pico_cpp specifically fi, and am following the work on rmw_zenoh closely (see ros2/rmw_zenoh#125 fi).

Main challenge: all available RMWs for Zenoh are C++ based, and very recent versions of C++ as well, which the M+ GCC doesn't (properly) support.

At least some DDS vendors provide binary distributions for our target OS, which removes the need for us to build that layer of the stack entirely.

Edit: and besides the technical challenges, Zenoh would only be usable for users on Jazzy and later. That doesn't mean we can't use it, but it does mean our installed base would mostly not be able to benefit from it, while still requiring quite some engineering effort.

gavanderhoorn commented 8 months ago

Having written that we'd be very interested in learning about possible alternatives.

While older controllers (like the FS100) might not benefit too much (due to resource constraints), newer generation controllers (YRC1000 and up) could potentially support (much) longer trajectories if the middleware wouldn't limit us.

(it's of course possible to switch to point queuing completely, as that theoretically doesn't suffer from any middleware imposed limits, but for a ROS robot driver to not have a FollowJointTrajectory action interface is a bit of a no-go)

jmarsik commented 8 months ago

Seeing all that there is probably no easy way to get rid of that limitation. Thank you for the comments about Zenoh and the links!

Regarding the FJT interface I think that a lot of users would be totally ok with that interface provided not directly by the controller ROS2 node, but by an external properly written and documented node running in the full ROS2 part of the environment, internally using point streaming towards the controller node (via microROS). Even now with the microROS agent you need something else (the agent) to actually use the controller in full ROS2 environment. I have seen your Python PoC code and I think that we will try to implement that in C++ with some additional features. Hopefully sometime around July or August, if someone else doesn't do it before us. We would contribute that if it works and it can then become a part of the solution, for people who have longer trajectories and can't use MotoROS2 at this moment.

There is another thing that complicates MotoROS2 usage for us, given that we have onboard computers on robot+controller combo that runs the ROS2 stack in production, but we are also developing the software in the same environment, when the full stack runs on a developer's laptop connected to the same network. The onboard computer is still there, just with the stack disabled or firewalled out. But MotoROS2 has the IP of the microROS agent (which will then run on the laptop) in its configuration and changing it is not that easy (also requires restart of the controller). Would it be possible to use a robot string variable for that IP address and manage reconnection to the microROS agent internally in MotoROS2 when this variable changes?

I will probably look into that just before we start to develop that FJT node, but maybe it's totally against some architectural decisions that you made?

Sorry for a longer text.

ted-miller commented 8 months ago

Would it be possible to use a robot string variable for that IP address and manage reconnection to the microROS agent internally in MotoROS2 when this variable changes?

I will probably look into that just before we start to develop that FJT node, but maybe it's totally against some architectural decisions that you made?

That is possible from a technical view. You could monitor the value of the S variable somewhere like here. When it changes, you can clear this flag to force a disconnect.

Historically though, I have had a lot of issues with using robot-user-variables for configuration data. There have been many instances of users using overlapping variable ranges, or issuing general "clear" commands which erase the configuration settings. So, as a general rule, I try not to use robot variables anymore.

For the general use-case, most "in production" users would just have a single target IP address. But I can certainly see how a development/debug system can have unique configurations. Perhaps it would be helpful to add something like #define DEBUG_MODE which changes the behavior to meet your needs.

There could also potentially be a custom build-configuration which has additional compiler flags for your debug environment. If this is of interest, let's open a separate discussion to go over how to add a new configuration.

Yaskawa-Global / motoros2

Improve / work-around 200 pt maximum trajectory length limit #85