PilzDE / pilz_robots

PILZ robot manipulator module PRBT 6 in ROS
https://wiki.ros.org/pilz_robots
52 stars 25 forks source link

Modbus Timeout kills launch process #81

Closed JonathanGruner closed 5 years ago

JonathanGruner commented 5 years ago

Commit

05d7a01

Steps to reproduce

Start launch file on low-end pc with a large robot model. If the startup process takes more than 5 seconds, the modbus node dies.

Expected behavior

Node waits until ROS is ready. There should not be a timeout at all, but just a warning to notify the user: "Service xxx not yet available, waiting...".

Observed behavior

Modbus node dies and takes down ROS.

Logfiles

[ INFO] [1552301651.010005889]: waitForService: Service [/prbt/manipulator_joint_trajectory_controller/hold] has not been advertised, waiting...
process[usb_cam-11]: started with pid [4544]
process[aruco_detect-12]: started with pid [4551]
process[ref_fid_publisher-13]: started with pid [4553]
process[fiducial_tf_converter-14]: started with pid [4567]
[ INFO] [1552301651.281068524]: Loading robot model 'prbt'...
[ WARN] [1552301651.281614825]: Skipping virtual joint 'FixedBase' because its child frame 'prbt_base_link' does not match the URDF frame 'base'
[ INFO] [1552301651.281992211]: No root/virtual joint specified in SRDF. Assuming fixed joint
[ INFO] [1552301651.980708962]: rviz version 1.12.16
[ INFO] [1552301651.980808438]: compiled against Qt version 5.5.1
[ INFO] [1552301651.980838067]: compiled against OGRE version 1.9.0 (Ghadamon)
[ INFO] [1552301652.355542478]: Using fixed control period: 0.010000000
[ INFO] [1552301652.587717879]: Initializing XXX
[ INFO] [1552301652.588118552]: Current state: 1 device error: system:0 internal_error: 0 (OK)
[ INFO] [1552301652.588401682]: Current state: 2 device error: system:0 internal_error: 0 (OK)
EMCY: 83#0090212500000044
EMCY: 84#0090212500000044
EMCY: 85#0090212500000044
[ INFO] [1552301652.812759987]: Stereo is NOT SUPPORTED
[ INFO] [1552301652.813402492]: OpenGl version: 3 (GLSL 1.3).
EMCY: 86#0090212500000044
EMCY: 87#0090212500000044
EMCY: 88#0090212500000044
ER: 33
[ INFO] [1552301653.775363515]: Current state: 2 device error: system:125 internal_error: 0 (OK)
[ INFO] [1552301653.775458465]: Current state: 1 device error: system:125 internal_error: 0 (OK)
[ INFO] [1552301653.775812399]: Current state: 1 device error: system:0 internal_error: 0 (OK)
[ INFO] [1552301653.775892800]: Current state: 0 device error: system:0 internal_error: 0 (OK)
[ INFO] [1552301653.775944583]: Current state: 0 device error: system:0 internal_error: 0 (OK)
[robot_init-6] process has finished cleanly
log file: /home/jgruner/.ros/log/f8ab4628-43eb-11e9-a364-8439be69721c/robot_init-6*.log
[ INFO] [1552301654.292010270]: Publishing maintained planning scene on 'monitored_planning_scene'
[ INFO] [1552301654.310662849]: MoveGroup debug mode is OFF
Starting context monitors...
[ INFO] [1552301654.310756724]: Starting scene monitor
[ INFO] [1552301654.333239512]: Listening to '/planning_scene'
[ INFO] [1552301654.333345179]: Starting world geometry monitor
[ INFO] [1552301654.356504353]: Listening to '/collision_object' using message notifier with target frame '/base '
[ INFO] [1552301654.373932880]: Listening to '/planning_scene_world' for planning scene world geometry
[ INFO] [1552301654.976778466]: Listening to '/attached_collision_object' for attached collision objects
Context monitors started.
[ INFO] [1552301654.993191516]: Reading limits from namespace /robot_description_planning
[ INFO] [1552301655.173408541]: Available plugins: pilz::PlanningContextLoaderCIRC pilz::PlanningContextLoaderLIN pilz::PlanningContextLoaderPTP 
[ INFO] [1552301655.173500376]: About to load: pilz::PlanningContextLoaderCIRC
[ INFO] [1552301655.177550915]: Registered Algorithm [CIRC]
[ INFO] [1552301655.178015249]: About to load: pilz::PlanningContextLoaderLIN
[ INFO] [1552301655.190567122]: Registered Algorithm [LIN]
[ INFO] [1552301655.190657450]: About to load: pilz::PlanningContextLoaderPTP
[ INFO] [1552301655.199674187]: Registered Algorithm [PTP]
[ INFO] [1552301655.200266448]: Using planning interface 'Simple Command Planner'
terminate called after throwing an instance of 'prbt_hardware_support::PilzStoModbusAdapterNodeException'
  what():  Controller service manipulator_joint_trajectory_controller/hold not available. STO adapter node can not function properly. Shutting down ROS ...
================================================================================REQUIRED process [prbt/sto_modbus_adapter_node-8] has died!
process has died [pid 4510, exit code -6, cmd /opt/ros/kinetic/lib/prbt_hardware_support/sto_modbus_adapter_node __name:=sto_modbus_adapter_node __log:=/home/jgruner/.ros/log/f8ab4628-43eb-11e9-a364-8439be69721c/prbt-sto_modbus_adapter_node-8.log].
log file: /home/jgruner/.ros/log/f8ab4628-43eb-11e9-a364-8439be69721c/prbt-sto_modbus_adapter_node-8*.log
Initiating shutdown!
================================================================================
agutenkunst commented 5 years ago

Seems that our controller does not come up. Maybe there is an error with the drives?

EMCY: 86#0090212500000044
EMCY: 87#0090212500000044
EMCY: 88#0090212500000044
ER: 33 // <-- this could be some indication...
[ INFO] [1552301653.775363515]: Current state: 2 device error: system:125 internal_error: 0 (OK)
[ INFO] [1552301653.775458465]: Current state: 1 device error: system:125 internal_error: 0 (OK)
[ INFO] [1552301653.775812399]: Current state: 1 device error: system:0 internal_error: 0 (OK)
jschleicher commented 5 years ago

@JonathanGruner Please report, if increasing the timeout helped.

JonathanGruner commented 5 years ago

@jschleicher I can't say it to 100%, but I feel it's more stable. But yesterday morning, I got this error every time I started the robot until I restarted my computer.

jschleicher commented 5 years ago

Okay, so this is more related to logging ros_canopen errors. But I'd still suggest to drop the timeout in sto_modbus_adapter_node in favor of the ros way "[INFO] ... not available yet, retrying..."

JonathanGruner commented 5 years ago

(@ct2034 was here) Note: timeout can be changed in prbt_hardware_support/include/prbt_hardware_support/pilz_modbus_read_client.h Idea: make configureable

jschleicher commented 5 years ago

Why not wait until ROS is ready? There should not be a timeout at all, but just a warning to notify the user: "Service xxx not yet available, waiting...". If you start rqt without a roscore it outputs the same: "Waiting...".

ct2034 commented 5 years ago

Note:

EMCY: 86#0090212500000044
EMCY: 87#0090212500000044
EMCY: 88#0090212500000044
ER: 33 // <-- this could be some indication...

Maybe we can improve the error handling here, too. the canopen python library seems to have info on the error codes: https://canopen.readthedocs.io/en/latest/_modules/canopen/emcy.html#EmcyError

JonathanGruner commented 5 years ago

I have now an Error: ER: 129

And I can't find the corresponding codes in the canopen python library

ct2034 commented 5 years ago

Yes you can. 129 is 0x80, when masked with 0xF0:

(0x8000, 0xF000, "Monitoring")

Not that this tells us anything more though ;-)

ct2034 commented 5 years ago

I am trying to implement this in canopen: https://github.com/ct2034/ros_canopen/commit/deba5d001e6784596a72bb9b3e35cef494788ed3

JonathanGruner commented 5 years ago

Now i got an empty error: '''ER:'''

JonathanGruner commented 5 years ago

The original issue could be solved by cloning pilz_robots in my workspace, adjusting the timeout to 10 seconds in

prbt_hardware_support/include/prbt_hardware/support/pilz_modbus_read_client.h

and than rebuilding it with catkin_make. Maybe you could insert 10 or 20 seconds too in the original files at the first connection?

I made a new issue with the ER: XY error.

agutenkunst commented 5 years ago

There seems to be a confusion regarding the timeouts.

During the launch MODBUS_CONNECTION_RETRY_TIMEOUT_S_DEFAULT defined in pilz_modbus_read_client_node.cpp should be relevant. While running

The definition of RESPONSE_TIMEOUT_IN_MS in pilz_modbus_read_client.h is related to the detection of a disconnect. Please reset this to the default and try to adjust MODBUS_CONNECTION_RETRY_TIMEOUT_S_DEFAULT if you experience issues during launch. Or it would be better to increase MODBUS_CONNECTION_RETRIES_DEFAULT if your launch takes some time...

So the chain of cause is:

  1. Error in Can
  2. Controller does not come up, therefore does not publish /hold topic
  3. ModbusAdapter starts, expects /hold topic to be available in order to function properly. Since the expectation is not met we implemented a total shutdown of ROS since in this state the handling of the STO is totally unclear.

@jschleicher Close? Other suggestion?

Maybe open an issue on

ER: 33

during launch?

jschleicher commented 5 years ago

I'll prepare a separate PR to continue the discussion. This issue is about the initial timeout during launch.

97 adresses the disconnect after the nodes are up and running.