Closed guiolpei closed 1 year ago
Since you mention #182 yourself: have you tried increasing the priority of the driver process and has that had any effect?
Can you explain exactly how is the correct way to do this?
Does the link to the ROS Answers Q&A in https://github.com/UniversalRobots/Universal_Robots_ROS_Driver/issues/182#issuecomment-631329522 provide sufficient detail?
Can this be done from the command line when executing the roslaunch command?
roslaunch ur_robot_driver ur10e_bringup.launch robot_ip:=XXX.XXX.XXX.XXX kinematics_config:=/Y/ur10e-Z.yaml
No, you'll need to change the ur_control.launch
file. Specifically, this line:
and add the launch-prefix
to the node
element.
Edit: o wait, I see that launch-prefix
is actually an arg
of that .launch
file. @fmauch: can we use that to pass the required nice
command?
Edit2: o wait again: that is already used for the debug
arg. So that won't work.
@guiolpei: you'll want to remove what is there in launch-prefix
currently and use the nice
command described in the ROS Answers Q&A.
But this is all just a test.
However, it would be very interesting whether this test leads to an improvement.
Ok, sorry, I was too quick.
@guiolpei: as the ROS Answers Q&A also mentions, you'd need to run nice
with sudo
to give a process a higher priority.
So unless you've enabled passwordless sudo
for your user and/or that command (ie: nice
) this won't work.
I'm also not sure whether process priorities are inherited by child processes spawned by roslaunch
, so running the entire .launch
file with a higher priority might also not work.
Edit: a quick Google for "nice without sudo" directs me to How can I allow a user to prioritize a process to negative niceness?. That should not be too difficult to configure (essentially editing the mentioned configuration file in /etc
), and would allow nice
to set higher priorities without sudo
for a specific user only.
Thank you both for your answers.
I have modified /etc/security/limits.conf
to allow my user to assign higher priorities.
and added a launch-prefix of nice -n -20
in the ur_control.launch
file.
Now htop
shows a value of PRI=0 and NI=-20 for the ur_driver
node.
I will check operation using this configuration to see if problem persists.
@fmauch: if this works, we could consider making the hardware interface node request the next highest priority if configuring the RT priority fails. Renice
ing is nice, but not the same as a proper priority.
It would probably be a good idea to do that anyway.
@guiolpei: I've also updated the ROS Answers Q&A with this information.
Yes, I agree.
Still timing out:
[ INFO] [1592308489.822836696]: Robot requested program [ INFO] [1592308489.822982410]: Sent program to robot [ INFO] [1592308489.993261828]: Robot ready to receive control commands. [ INFO] [1592308609.428015318]: Connection to robot dropped, waiting for new connection. [ERROR] [1592308610.146706968]: Can't accept new action goals. Controller is not running.
Maybe roscore
should also run with higher priority?
roscore
has nothing to do with this.
The roscore is not part of the communication between the robot and the driver.
Another way to find the source of package drop would be to
At the moment, we can't run a dedicated PC only for the driver and direct connection is not possible because the PC has to communicate with other nodes in the network.
Is there a way to control the value of this communication timeout (if it is indeed a timeout)?
At the moment, we can't run a dedicated PC only for the driver and direct connection is not possible because the PC has to communicate with other nodes in the network.
I would say what @fmauch suggests are ways to diagnose what the cause is.
Not suggestions for system configuration in a final/production environment.
I meant temporarily. If the timeout occurs always rather quickly (5 minutes runtime or something like this), you could just let it run for 15 minutes without the rest of your application. If you encounter the problems there, as well, it's likely a network issue (Though in the other case it could also be a network issue. If I understand it correctly, you have different PCs taking part in the application? Are they going over the same switch? This could be even worse than the control PC's load. )
There are several PCs in the application, all over the same switch. Maybe the best thing would be connecting the robot to the control PC directly with a dedicated network card.
We will continue testing and I will report back with any news.
Thank you both for your help!
After some testing, it seems it is not a network issue. It is due to high load on the control PC, reducing the number of tasks on it reduces the frequency of dropped connections significantly.
Because you asked earlier: You could increase the number of missed packages here, but I would not recommend it. Robot motions will change, as in those cases linear extrapolation will take place. Increasing this will only hide the resource problem that you have with your control PC with the cost of undeterministic behavior of your robot motions..
It is due to high load on the control PC
So switching to an RT kernel would fix this (as the driver requests a sufficiently high priority other processes should not be able to interfere any more), but it might import other issues (as some drivers fi don't work with RT kernels).
If it's really load, I would expect an increased priority for the driver process to help. I'm not sure I understand how the approach with nice
doesn't work. Unless you have other processes which have an equal or higher priority.
I did not really read all through the discussion (yet), but the problem seems familiar to what we experienced in a similar setup
Our solution (for now) was to increase the timeout in https://github.com/UniversalRobots/Universal_Robots_ROS_Driver/blob/master/ur_robot_driver/resources/ros_control.urscript#L107 from 0.02
to 0.04
I don't know all the details about it, but just want to mention it here...
Should have read the threads before posting things... :facepalm: I see this suggestion has been proposed in https://github.com/UniversalRobots/Universal_Robots_ROS_Driver/issues/182#issuecomment-631237843 already
@fmessmer Thank you for your comment, I have not actually tried this.
I don't know if this can mitigate the problem even with a high load on the control PC. Maybe @fmauch or @gavanderhoorn could explain its implications and I could try it out to see if it makes a difference.
Basically, what I suggested and what @fmessmer suggested have the same implications.
While my change increases the number of allowed timeout reads, changing the timeout increases the maximum time a cycle could take.
This issue has not been updated for a long time. If no further updates are added, this will be closed automatically. Comment on the issue to prevent automatic closing.
This issue has been closed due to inactivity. Feel free to comment or reopen if this is still relevant.
Summary
Every once in a while, communication between the UR robot and the robot driver is closed. The main program execution on the robot side is stopped (without any notification) and it must be run again (press "Play" button on robot panel).
The program defined in the robot contains only the ExternalControl URCap.
On the PC side, the command executed in the following:
roslaunch ur_robot_driver ur10e_bringup.launch robot_ip:=XXX.XXX.XXX.XXX kinematics_config:=/Y/ur10e-Z.yaml
Versions
Issue details
We are running a non-realtime Linux kernel, this might be related. We have a single PC connected via Ethernet through a switch to the robot. The Linux box is running RViz and processes images obtained from other nodes in the network.
Will update with logs and messages the next time it happens.
Related issues
182