Open DanielBilbao12 opened 2 years ago
First of all, thank you for posting this.
Basically, what I think is happening here:
ur_driver_->setKeepaliveCount(5);
after this line. You can change the number 5 to anything you like. The higher the number, the more control cycles without any feedback from the ROS machine the robot will allow. Note, that this is an assumption that this workaround should work, so please report back in any case.I hope, this explanation helps and we can address this properly, soon.
As mentioned on the forum, I also hit this issue when running the driver in a VM. On my side 20.04 and Noetic over a gigabit wired switch, no chance to get a stable connection in a VM (Parallels on Mac), but runs for days with zero hiccups on a real linux machine. I have not inquired further and suspected that it might be something to do with the VM not being responsive enough compared to native hardware, but this prevents me from developing with my laptop while travelling which is quite inconvenient, since I would ideally be able to run ROS in a VM alongside URSim, which is at the moment not possible (at least on my config).
I am not sure what is the error about the "maximum number of clients (1) already connected", could this point rather more to an issue with the network driver of the VM environment?
I have no time to dig further until the end of next week but will happily run more tests after to see if I can isolate the issue to something more specific.
Maybe a note, our robots are UR10 with CB3.0 running 3.15, the linux machines dont have a RT kernel.
Hello again!
First of all, thank you for your quick response!
Regarding the first point, I am using a virtual machine with ROS Melodic and Ubuntu 18.04. The connection between the robot controller and the PC is done via wired connection and the real time kernel, it is not enabled.
Regarding the solution proposed in the second point, by introducing a wait sentence of 0.5 s after the external_control node, the connection is re-established without the need to re-execute the program from the TP. The problem, as you say, is that in this way the robot stops (although it reconnects quickly).
Finally, we have tried to solve the problem by modifying the driver by introducing the sentence ur_driver_->setKeepaliveCount(5);
in the indicated place. We have tried changing the number 5 to higher numbers but the problem persists.
I think the problem is related to the number of poses being sent from the vision system node to the control system node because when the frequency of frames generated by the vision system increases, the connection drops earlier. Could it be because of the topic queue? I have set this to be 1 so that the pose sent by this node is the last one detected by the vision system and so the robot only moves to the final position and not to all the positions that the aruco passes through before being moved through the workspace.
Thanks in advanced, Daniel.
I would expect the problem to correlate with the system load actually going on on the system. I'll be in the lab tomorrow where I will have a look at a quick fix I did last year to a similar problem. I thought, I did the setKeepaliveCount
thing, but I might remember it wrong.
I have been doing more testing, we are currently deploying 2 UR10s, which we ideally want to drive off a single Jetson Nano for ease of deloyment, instead of having multiple machines.
I currently am able to run 2 nodes of the ur_robot_driver from the Jetson, and it seems to work well, with both processes hovering around 15% cpu load, but I can still see the occasional dropped connection, about once per hour:
[ INFO] [1644150281.232665034]: Robot requested program
[ INFO] [1644150281.232870502]: Sent program to robot
[ INFO] [1644150281.404626379]: Robot connected to reverse interface. Ready to receive control commands.
[ INFO] [1644150429.325002328]: Connection to reverse interface dropped.
[ERROR] [1644150429.325689976]: Sending data through socket failed.
[ INFO] [1644150430.317166557]: Robot connected to reverse interface. Ready to receive control commands.
[ERROR] [1644151231.750051161]: Unexpected error: No trajectory defined at current time. Please contact the package maintainer.
[ INFO] [1644151490.761816208]: Connection to reverse interface dropped.
[ INFO] [1644151491.754148939]: Robot connected to reverse interface. Ready to receive control commands.
I have a hunch that this is indeed system load-related, if I am doing other things like installing a package or loading a chrome tab i can see it happening more often. What is the setKeepaliveCount
trick you are mentionning? For my use-case I can allow a pretty loose timeout since what I care about most is stability over time.
Edit: might be the obvious thinking but, I have avoided having to mess with RT kernels up till now, wouldn't this be the solution? I have little time left to fiddle with this but if there is a dramatic difference in behavior with/without a RT kernel I might put the time needed to upgrade the machine.
Quick note, I went ahead and compiled a RT kernel for the Jetson Nano and have not seen any dropped connection since then, so it very likely is due to occasional jitter which just gets worse when system load goes up.
I still have these Unexpected error: No trajectory defined at current time. Please contact the package maintainer.
messages recurring but these are likely an unrelated issue, though I'd prefer to figure out what is causing them (I am publishing trajectory messages from Rosbridge websocket server to the Topic interface at about 10Hz).
Just wanted to add some further notes: running 20.04 with RT kernel on a Raspberry Pi 4, but connected to URSim is also very stable with only very occasional dropped connection, at least rare enough to be useable. So the issue is really about running the ur_robot_driver
node in a VM.
Hi,
I'm facing the exact same problem with UR3e right now. Has anyone found a good fix for this? I just pulled all the latest changes from the UR ros driver package and updated the polyscope on the teach pendant to the most recent version.
I installed all the dependencies correctly using rosdep install -r --from-paths . --ignore-src --rosdistro $ROS_DISTRO -y -i --verbose, did an update and upgrade after, recompiled my ws and rebooted both the robot and the PC.
I haven't faced this issue till now and it suddenly started happening today. I'm not sure what changed. I checked and made sure the robot is connected to the network on the teach pendant and my ros driver is connecting to the correct ip address.
Kindly help! @fmauch
It might not be your case, but I had forgotten to update the Universal_Robots_Client_Library
and had systematic connection drops. Was resolved after the update.
- As a workaround (I do see the problem here and this should be fixed properly), you could try modifying the driver by setting a higher keepalive counter. This can be done by adding
ur_driver_->setKeepaliveCount(5);
after this line
Since the file seems to have changed since this comment, could you please update the link or share after which line the setKeepAliveCount
is supposed to be set @fmauch ?
Or is there a new workaround for this issue ?
Sorry I didn't use a permalink. You can insert that code after this line: https://github.com/UniversalRobots/Universal_Robots_ROS_Driver/blob/222d67aad0bfb4f35e7acc32d34b4fb74710c0bd/ur_robot_driver/src/hardware_interface.cpp#L314-L315
Thanks, much appreciated !
And where would this line be added for the ROS 2 drivers @fmauch ? I have the same problem with Foxy
@alexandrosnic Please see https://github.com/UniversalRobots/Universal_Robots_ROS2_Driver/issues/534
Sorry I didn't use a permalink. You can insert that code after this line:
It makes UR Simulator (Vmware / Physical Machine) work well ! I suggest adding a parameter like simulator: =True
or alive_count: =5
for the ros driver launch file.
@jsbyysheng A more descriptive name of what the parameter actually does, such as max_missed_packages
or something like that would maybe be more suitable. The keepalive_count
IMHO makes sense internally, but for driver users, this probably isn't self explanatory.
However, when we introduce such a parameter, this would have to be well-documented so users are aware of the implications following from that.
@jsbyysheng A more descriptive name of what the parameter actually does, such as
max_missed_packages
or something like that would maybe be more suitable. Thekeepalive_count
IMHO makes sense internally, but for driver users, this probably isn't self explanatory.However, when we introduce such a parameter, this would have to be well-documented so users are aware of the implications following from that.
In fact, this parameter makes more sense for URSim than a real robot. It is not necessary for URSim to keep RT communication. A simple launch parameter named using_ursim:=true
with ur_driver_->setKeepaliveCount(5);
fixed to 5
is totally enough. A count number of more than 5 may make no sense because the network environment is unstable for any communication.
one of the problem might be the hardware which is used to connect the system.
hello, i just meet such problem too.My computer is Ubuntu 20.04, ROS-Noetic, dual system, and 5.15.0-113Generic kernel. After I enabled urcap in the teach pendant, I established a connection with the real robot through the driver. I use the control cabinet io output to control the external air pump and drive ur5 to execute the trajectory planned by moveit at the same time, but the problem occurs. Once the io signal and the planned trajectory are running at the same time, urcap will be interrupted and the sending signal will prompt failure: [ INFO] [1720767234.769926758]: Completed trajectory execution with status PREEMPTED ... [ INFO] [1720767234.772389043]: Execution completed: PREEMPTED [ INFO] [1720767237.051394741]: Robot requested program [ INFO] [1720767237.051487070]: Sent program to robot [ INFO] [1720767237.401967035]: Robot connected to reverse interface. Ready to receive control commands. [ INFO] [1720767257.018619063]: Connection to reverse interface dropped. But I don't use io signals, and the execution of the track alone can be successful, but sometimes it will be interrupted. I don't know where the problem lies?
first thing: if you dont use a RT kernel you'll get constant drops of connection, so you should really fix that first.
second: i am now on ROS2 but I've seen similar issue, you mention executing a trajectory while toggling an output. I've observed the same: if using the set_io
service while the robot is executing a trajectory it would drop the connection half the time, forcing to reconnect. I'm thinking that the set_io
call somehow blocks the controller long-enough to timeout the connection.. didnt have time to dig into it any further, had to find a quick workaround.
first thing: if you dont use a RT kernel you'll get constant drops of connection, so you should really fix that first.
second: i am now on ROS2 but I've seen similar issue, you mention executing a trajectory while toggling an output. I've observed the same: if using the
set_io
service while the robot is executing a trajectory it would drop the connection half the time, forcing to reconnect. I'm thinking that theset_io
call somehow blocks the controller long-enough to timeout the connection.. didnt have time to dig into it any further, had to find a quick workaround.
hello, do you know which RL-kernel version should i choose for nvidia driver 550.54.14. i am now fix the kernel and the driver, but i meet new problem. Building module: cleaning build area... unset ARCH; [ ! -h /usr/bin/cc ] && export CC=/usr/bin/gcc; env NV_VERBOSE=1 'make' -j16 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.15.76-rt53 IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/5.15.76-rt53/build LD=/usr/bin/ld.bfd modules...(bad exit status: 2) ERROR (dkms apport): kernel package linux-headers-5.15.76-rt53 is not supported Error! Bad return status for module build on kernel: 5.15.76-rt53 (x86_64) Consult /var/lib/dkms/nvidia/550.54.14/build/make.log for more information. can you give me some advice?
dont think I can help much.. i've been using a lot the prebuilt raspi ros2 images here https://github.com/ros-realtime/ros-realtime-rpi4-image but that wont be of much help for you i think
dont think I can help much.. i've been using a lot the prebuilt raspi ros2 images here https://github.com/ros-realtime/ros-realtime-rpi4-image but that wont be of much help for you i think
Thank you so much for your help!i will continue to think about it.
dont think I can help much.. i've been using a lot the prebuilt raspi ros2 images here https://github.com/ros-realtime/ros-realtime-rpi4-image but that wont be of much help for you i think i have installed 5.15.76-rt53, but it didn't work as expected. It still reported an error, but this time it was a little different: [ INFO] [1720873969.508932034]: Robot connected to reverse interface. Ready to receive control commands. [ INFO] [1720873975.622664768]: Received request to compute Cartesian path [ INFO] [1720873975.624387838]: Attempting to follow 31139 waypoints for link 'tool_ros' using a step of 0.010000 m and jump threshold 0.000000 (in global reference frame) [ INFO] [1720873976.552106491]: Computed Cartesian path with 31145 points (followed 100.000000% of requested trajectory) [ERROR] [1720873977.292264538]: Sending data through socket failed. [ INFO] [1720873977.292354495]: Connection to reverse interface dropped. [ INFO] [1720873977.372595596]: Execution request received [ERROR] [1720873977.400574305]: Can't accept new action goals. Controller is not running. [ WARN] [1720873977.400910620]: Controller 'scaled_pos_joint_traj_controller' failed with error INVALID_GOAL: [ WARN] [1720873977.401011718]: Controller handle scaled_pos_joint_traj_controller reports status FAILED [ INFO] [1720873977.401071288]: Completed trajectory execution with status FAILED ... [ INFO] [1720873977.402529277]: Execution completed: FAILED
i have installed 5.15.76-rt53, but it didn't work as expected. It still reported an error, but this time it was a little different: [ INFO] [1720873969.508932034]: Robot connected to reverse interface. Ready to receive control commands. [ INFO] [1720873975.622664768]: Received request to compute Cartesian path [ INFO] [1720873975.624387838]: Attempting to follow 31139 waypoints for link 'tool_ros' using a step of 0.010000 m and jump threshold 0.000000 (in global reference frame) [ INFO] [1720873976.552106491]: Computed Cartesian path with 31145 points (followed 100.000000% of requested trajectory) [ERROR] [1720873977.292264538]: Sending data through socket failed. [ INFO] [1720873977.292354495]: Connection to reverse interface dropped. [ INFO] [1720873977.372595596]: Execution request received [ERROR] [1720873977.400574305]: Can't accept new action goals. Controller is not running. [ WARN] [1720873977.400910620]: Controller 'scaled_pos_joint_traj_controller' failed with error INVALID_GOAL: [ WARN] [1720873977.401011718]: Controller handle scaled_pos_joint_traj_controller reports status FAILED [ INFO] [1720873977.401071288]: Completed trajectory execution with status FAILED ... [ INFO] [1720873977.402529277]: Execution completed: FAILED
i have installed 5.15.76-rt53, but it didn't work as expected. It still reported an error, but this time it was a little different: [ INFO] [1720873969.508932034]: Robot connected to reverse interface. Ready to receive control commands. [ INFO] [1720873975.622664768]: Received request to compute Cartesian path [ INFO] [1720873975.624387838]: Attempting to follow 31139 waypoints for link 'tool_ros' using a step of 0.010000 m and jump threshold 0.000000 (in global reference frame) [ INFO] [1720873976.552106491]: Computed Cartesian path with 31145 points (followed 100.000000% of requested trajectory) [ERROR] [1720873977.292264538]: Sending data through socket failed. [ INFO] [1720873977.292354495]: Connection to reverse interface dropped. [ INFO] [1720873977.372595596]: Execution request received [ERROR] [1720873977.400574305]: Can't accept new action goals. Controller is not running. [ WARN] [1720873977.400910620]: Controller 'scaled_pos_joint_traj_controller' failed with error INVALID_GOAL: [ WARN] [1720873977.401011718]: Controller handle scaled_pos_joint_traj_controller reports status FAILED [ INFO] [1720873977.401071288]: Completed trajectory execution with status FAILED ... [ INFO] [1720873977.402529277]: Execution completed: FAILED
20:40:36.536769 IP 192.168.0.10.60435 > zp-ros.50002: Flags [.], ack 14481, win 455, options [nop,nop,TS val 12400012 ecr 119005074], length 0 20:40:36.536773 IP 192.168.0.10.60435 > zp-ros.50002: Flags [.], ack 23169, win 591, options [nop,nop,TS val 12400012 ecr 119005074], length 0 20:40:36.536774 IP 192.168.0.10.60435 > zp-ros.50002: Flags [.], ack 25312, win 624, options [nop,nop,TS val 12400012 ecr 119005075], length 0 20:40:36.623828 IP 192.168.0.10.60435 > zp-ros.50002: Flags [F.], seq 17, ack 25312, win 624, options [nop,nop,TS val 12400099 ecr 119005075], length 0 20:40:36.623910 IP zp-ros.50002 > 192.168.0.10.60435: Flags [F.], seq 25312, ack 18, win 509, options [nop,nop,TS val 119005162 ecr 12400099], length 0 20:40:36.624020 IP 192.168.0.10.60435 > zp-ros.50002: Flags [.], ack 25313, win 624, options [nop,nop,TS val 12400099 ecr 119005162], length 0 When I listen to port 5002, I start urcap before 20:41, and it does not update the message after running the script
Summary
I have made a ROS program that allows me to send the position of an ArUco marker to the robot, so that it can be placed just above it at a defined distance (in Z). To do this, two nodes are used: One node for the vision system and the transmission of the ArUco’s pose and another node to receive this information and execute the robot’s movement. Everything works correctly, I mean, the robot positions itself satisfactorily until suddenly (I don’t know the cause) the robot stops accepting new poses and the connection is closed. If I run again the external control from the teachpedant, the program still works.
Versions
Impact
Due to this problem, it is not possible to smoothly use external control for guidance tasks based on visual markers.
Issue details
I have made a ROS program that allows me to send the position of an ArUco marker to the robot, so that it can be placed just above it at a defined distance (in Z). To do this, two nodes are used: One node for the vision system and the transmission of the ArUco’s pose and another node to receive this information and execute the robot’s movement. Everything works correctly, I mean, the robot positions itself satisfactorily until suddenly (I don’t know the cause) the robot stops accepting new poses and the connection is closed. If I run again the external control from the teachpedant, the program still works.
The controller used is the “fordward_cartesian_traj_controller”. The driver has been downloaded from : https://github.com/UniversalRobots/Universal_Robots_ROS_Driver
In addition, the message that appears on the console is attached in the following image:
Extra details
In case you require further information, please do not hesitate to ask for it.
Thanks in advanced!