persistent "Final position was outside tolerance"

rsaintobert commented 2 months ago

Anytime a trajectory is submitted I get the following error (with more or less joints incriminated), and the goal is aborted.

Final position was outside tolerance. Check robot safety-limits that could be inhibiting motion. [gp50_joint_1_s: 0.0000 deviation] [gp50_joint_6_t: 0.0000 deviation]

The log is done here but the 4-digits precision does not give enough info. Anyway the deviation is quite low here.

Click to expand:

https://github.com/Yaskawa-Global/motoros2/blob/9026a1d0b50b483f7c334bfd5ebbac8b328bfdb6/src/ActionServer_FJT.c#L563-L580

The problem is more in the check of the deviation:

Click to expand:

https://github.com/Yaskawa-Global/motoros2/blob/9026a1d0b50b483f7c334bfd5ebbac8b328bfdb6/src/ActionServer_FJT.c#L500-L520

Leaving the goal_tolerance parameter unset in the goal request should default to a tolerance of 0.01 which is not the case. Maybe the message buffer is badly initialized ?

Click to expand:

https://github.com/Yaskawa-Global/motoros2/blob/9026a1d0b50b483f7c334bfd5ebbac8b328bfdb6/src/ActionServer_FJT.c#L511-L512

Anyway, any value set in the goal request leads to the same result, so the user input is not taken properly into account.

gavanderhoorn commented 2 months ago

Could you please provide:

the motoros2_config.yaml as loaded on the controller
an example of a FollowJointTrajectoryAction goal which results in the problem you report (dumping the complete goal to stdout from your client program would be one way to do this)
a full debug log, from controller start up to and including a failed motion execution
information on which version of MotoROS2 this is, version of ROS 2, how you installed it, etc
information on your robot controller: type and robot configuration (a copy of the PANELBOX.LOG from your controller would be easiest)

(if easier: a minimal action client which submits a goal which results in the problem you report would also be ok -- and would make it easier to reproduce your problem)

Edit: and ideally: a Wireshark capture of the network traffic between your Agent and MotoROS2.

gavanderhoorn commented 2 months ago

I've edited the title of the issue to better reflect the symptoms observed.

We can't conclude yet what is the cause.

ted-miller commented 2 months ago

Leaving the goal_tolerance parameter unset in the goal request should default to a tolerance of 0.01 which is not the case

I just noticed the == being used on a double here:

https://github.com/Yaskawa-Global/motoros2/blob/d563a79543956159b35e462cf7c49f8e3f9d2ec8/src/ActionServer_FJT.c#L505-L507

I'm wondering if posTolerance is some super-small value and technically non-zero. So, the comparison fails and the tolerance doesn't get set to default.

We should probably be using this value for the comparison instead:

https://github.com/Yaskawa-Global/motoros2/blob/d563a79543956159b35e462cf7c49f8e3f9d2ec8/src/MathConstants.h#L24-L25

gavanderhoorn commented 2 months ago

I've seen infinitesimal values for fields which were supposed to be 0.0 working on #226.

It's possible the (de)serialisation causes the field to no longer be exactly 0.0.

gavanderhoorn commented 2 months ago

Although thinking about it more: I'm curious to understand what makes this come up now in these cases, as the code implicated hasn't seen changes since MotoROS2 was first released.

Illumo-Vincent commented 2 months ago

Could you please provide:

* the `motoros2_config.yaml` as loaded on the controller

* an example of a `FollowJointTrajectoryAction` goal which results in the problem you report (dumping the complete goal to `stdout` from your client program would be one way to do this)

* a [full debug log](https://github.com/Yaskawa-Global/motoros2/blob/b78ea310d6a5bb2b1c8f80ee0bf104c1986ca119/doc/troubleshooting.md#debug-log-client), from controller start up to and including a failed motion execution

* information on which version of MotoROS2 this is, version of ROS 2, how you installed it, etc

* information on your robot controller: type and robot configuration (a copy of the `PANELBOX.LOG` from your controller would be easiest)

(if easier: a minimal action client which submits a goal which results in the problem you report would also be ok -- and would make it easier to reproduce your problem)

Edit: and ideally: a Wireshark capture of the network traffic between your Agent and MotoROS2.

Here are the requested files: files.zip The version of ROS used is Humble.

ted-miller commented 2 months ago

I have opened #236 to fix the comparison-check.

Here is a build to see if it has any impact on this issue. pr236.zip

gavanderhoorn commented 2 months ago

As @yai-rosejo noted in https://github.com/Yaskawa-Global/motoros2/discussions/235#discussioncomment-9072700, the path_tolerance and goal_tolerance fields aren't (always) populated.

The trajectory shared by @Illumo-Vincent (@rsaintobert?) has this:

 path_tolerance: []
 component_path_tolerance: []
 goal_tolerance: []
 component_goal_tolerance: []

without a check here on the length of the array:

https://github.com/Yaskawa-Global/motoros2/blob/d563a79543956159b35e462cf7c49f8e3f9d2ec8/src/ActionServer_FJT.c#L505-L507

we're most likely indexing into uninitialised memory.

This is UB in any case, but for some reason tends to fail for @rsaintobert / @Illumo-Vincent (and @TE-2 in #235) more than for other users, leading to the issue reported.

The change made by @ted-miller in #236 is a good one, but if we're indeed not sufficiently checking for goal_tolerance array length, not enough to deal with this problem I believe.

Edit: I've tried reproducing the issue with MoveIt, but don't seem to be able to. That doesn't mean it doesn't exist. I might just 'be lucky' and the memory being read happens to evaluate to a larger double than the per-joint delta.

gavanderhoorn commented 2 months ago

Looking into this more, it seems the current code doesn't parse the goal_tolerance part of the FJT goal correctly:

https://github.com/Yaskawa-Global/motoros2/blob/d563a79543956159b35e462cf7c49f8e3f9d2ec8/src/ActionServer_FJT.c#L505

It assumes all joints have a tolerance provided and the elements are ordered in the same way as the joints in the JointTrajectory.

In reality, we can't assume len(goal.trajectory.joint_names) == len(goal.goal_tolerance), and the ordering may also be different from that of goal.trajectory.joint_names. Each element in goal.goal_tolerance (of type control_msgs/JointTolerance) has a name field (here) which should be checked and used for finding the correct joint state to compare against.

@ted-miller: my suggestion would be to disable the final goal_tolerance check for now, provide @Illumo-Vincent / @rsaintobert and @TE-2 with a fixed build and in parallel work on correcting the implementation.

We have to keep in mind that in almost all cases, the Yaskawa controller will already have aborted the motion in case tracking error prevents it from faithfully executing the trajectory. The additional goal_tolerance check in MotoROS2 is wholly extra.

gavanderhoorn commented 2 months ago

I've submitted #237 for us to merge the work-around of disabling the broken check for now. It will always compare against the default joint position tolerance instead of attempting to use a user-provided value.

ted-miller commented 2 months ago

@yai-rosejo has a fix which he's validating now. I don't think #237 is necessary.

yai-rosejo commented 2 months ago

I have been working on a potential fix which can be found on my branch here: https://github.com/yai-rosejo/motoros2/tree/issue_233_outside_goal_tolerance On this branch I check the joint names and assign the goal position tolerances based off the names instead of just array indexing. I also allow support of the -1.0 option available for tolerances. My testing has shown that the values are being used correctly, but the Ros_Debug_BroadcastMsg is not able to print these values in a way that we would be expecting. This picture shows that if I do a conditional print on a value of 2.0 or -1.0 that the value is correct. But that value is not able to be printed correctly, as shown by the multiple 0.000000s. I am including on here both the mr2_yrc1_h.out and the fjt_client.py that I have been using for testing. tolerance_issue_04122024.zip

To run these tests:

Upload the mr2_yrc1_h.out to the controller and then restart the controller
Run the microros client in a terminal : sudo docker run -it --net=host -v /dev:/dev --privileged microros/micro-ros-agent:humble udp4 -p 8888
Start the TrajMode: ros2 service call /start_traj_mode motoros2_interfaces/srv/StartTrajMode
Run the fjt client: python3 fjt_client.py
Optionally the debug_listener can also be ran to capture messages.

ted-miller commented 2 months ago

It seems we're fighting multiple issues.

1) The tolerance arrays are not being properly validated for size and order. The fix from yai-rosejo should resolve this.

2) The debug broadcasts are not printing doubles properly. We ran some tests that multiplied the values by 1000 and then cast as an int. The integer displays as expected. So we'll need to investigate vsnprintf behavior some more.

3) I believe that Ros_Controller_IsInMotion() is not strict enough. Specifically START_MAX_PULSE_DEVIATION should be decreased. (Or make a new constant for "END deviation".) (Maybe make this configurable in yaml?)

A brief test indicated that the fb pos trails the cmd pos by about .0009 radians. If my calculations are right, for some axes (such as GP12 T axis) that's only 24 pulse counts. So, once the increment-queue is empty, Ros_Controller_IsInMotion would immediately return false, even though the robot is still moving.

I'm also suspicious that the difference between fb and cmd could shrink as the robot is decelerating. This might make it easier for other axes to also exceed the threshold.

4) @yai-rosejo believes that feedback_FollowJointTrajectory.feedback.desired is not getting updated in the last cycle. This throws off the calculation for diff.

More testing is needed before we'll have an idea for PR.

gavanderhoorn commented 2 months ago

Could we open and merge PRs for each of the identified issues?

The fix for the incorrect parsing of the tolerances by @yai-rosejo seems like a good first step.

Regardless of whether any of the other issues get fixed, loading those tolerance entries would need to be corrected.

gavanderhoorn commented 2 weeks ago

@rsaintobert: with the merge of #241 this should now be fixed.

We'll have to release a new binary for this to be generally available though. Please let us know whether you'd like a prerelease build.

Yaskawa-Global / motoros2

persistent "Final position was outside tolerance" #233