Open stefanz-tri opened 4 years ago
Network: USB-LAN Adaptor [..]
please try and test things with a direct cable connection to the physical ethernet port in the controller cabinet.
If that works, I would heavily suspect the USB adapter.
connected to Switch in Robot Cabinet
do you mean the USB hub in the controller?
At first, I would also suspect the USB adapter...
The log entry "Stopping communication and control" is written when the URScript running on the controller stops. This only happens, when no keepalive signal is received from the ROS pc for a certain amount of time (Currently, this is set to 20 ms here).
So, this could in fact be either a networking issue or an issue with too much load on the ROS machine.
Thanks for your feedback so far!
This morning I tested with a different laptop and an internal LAN adaptor and was running into the same issues. Also I tested different cables with no changes. Probably this hardware was also not really up to the task but it is all I have available right now (in the future the system will run on more appropriate hardware ;-) )
Having to much load is quite possible as I am doing a lot of RViz visualizations on the same machine. I will try to play around with the keepalive timeout and check if I can observe an effect. The script in the recources linked above is the one sent and executed on the UR right? So the changes in that file should have an effect immediately without rebuilding the workspace? Do I need to restart extcontrol?
Yes, the linked script is being executed directly on the robot. It is sent to the robot, each time the program enters the "External Control" program node. In order to reload the altered file, simply restart the driver, as it is read initially.
A high load on the ROS machine can indeed be a huge problem, especially, when no realtime kernel is in place as recommended.
This morning I tested with a different laptop and an internal LAN adaptor and was running into the same issues. Also I tested different cables with no changes. Probably this hardware was also not really up to the task but it is all I have available right now (in the future the system will run on more appropriate hardware ;-) )
A test to see whether the load is the cause here could be to run just the driver, and then try to use rqt_joint_trajectory_controller
to move the robot.
If that does work, it would be a good indication that the load could have something to do with it.
If it is load, instead of changing the timeout values, you could also consider increasing the process priority of the driver node. This would seem to be a good idea anyway, as typically responsiveness (or liveness almost) of drivers would seem important to me.
You could try to see whether using a launch-prefix
on the driver node
to renice
it works. See How to launch nodes with realtime priority on ROS Answers for an example.
I tested some of your ideas:
So far I did not get around to test the driver with increased process priority but will hopefully be able to do it tomorrow.
It seems like everything indeed boils down to the load on the PC (which is considerable using RViz). The current plan for the hardware setup is to have one machine only for the core functionality (driver, planner) and activate the visualization on a different device on demand.
Thanks again for your input.
@fmauch: the driver currently tries to enable an RT priority at start, but if it fails, does not try to then request the next-highest priority iirc.
It may be worth considering adding that behaviour. Or at least make it possible to configure the driver to do that.
Hi!
I experience the following problems while moving a UR5 robot with MoveIt using the ur_robot_driver:
The system operates normally at first. After about 4 minutes (sometimes earlier) the robot starts reconnecting showing the following in the ROS log:
On the teach pendant the log shows "Stopping communication and control".
After the first reconnect the connection is unstable and more reconnects follow. While the system continues to operate after the first reconnect, it is much slower and more unreliable than before (e.g. positions are just not approached without any error message in the log). After a short while it is not really usable anymore.
This behaviour always occurs when running the system and always has the same timing more or less. I do not use the teach pendant during the operation or do any kind of manual robot move.
Does anyone have an Idea what could cause this? Do you think the hardware limitations described below are enough to cause these problems? (I have my doubts because the timing is always the same, the controller does not work normal after the reconnect and RTDE works normally.)
Setup
Robot: UR5 CB3 3.9.1.64192 ROS: Kinetic System: Ubuntu 16.04, Native (No VM) Hardware: Standard Laptop, no Realtime Network: USB-LAN Adaptor connected to Switch in Robot Cabinet URCap: ExternalControl 1.0.1 UR Driver: bf26ea5eca4f6c586da029abc91774481c470d28 (but I had the same problem with a version some weeks older)
Details
I analyzed the connectivity in Wireshark. It looks like the robot is stopping to send data and triggers a reset shortly after. RTDE (port 30004) is running normally and does not seem to be influenced. In the following Log 192.168.10.30 is the laptop and 192.168.10.11 is the robot: