UniversalRobots / Universal_Robots_ROS_Driver

Universal Robots ROS driver supporting CB3 and e-Series
Apache License 2.0
767 stars 405 forks source link

Connection to reverse interface dropped - UR10e #507

Open DanielBilbao12 opened 2 years ago

DanielBilbao12 commented 2 years ago

Summary

I have made a ROS program that allows me to send the position of an ArUco marker to the robot, so that it can be placed just above it at a defined distance (in Z). To do this, two nodes are used: One node for the vision system and the transmission of the ArUco’s pose and another node to receive this information and execute the robot’s movement. Everything works correctly, I mean, the robot positions itself satisfactorily until suddenly (I don’t know the cause) the robot stops accepting new poses and the connection is closed. If I run again the external control from the teachpedant, the program still works.

Versions

Impact

Due to this problem, it is not possible to smoothly use external control for guidance tasks based on visual markers.

Issue details

I have made a ROS program that allows me to send the position of an ArUco marker to the robot, so that it can be placed just above it at a defined distance (in Z). To do this, two nodes are used: One node for the vision system and the transmission of the ArUco’s pose and another node to receive this information and execute the robot’s movement. Everything works correctly, I mean, the robot positions itself satisfactorily until suddenly (I don’t know the cause) the robot stops accepting new poses and the connection is closed. If I run again the external control from the teachpedant, the program still works.

The controller used is the “fordward_cartesian_traj_controller”. The driver has been downloaded from : https://github.com/UniversalRobots/Universal_Robots_ROS_Driver

In addition, the message that appears on the console is attached in the following image: MicrosoftTeams-image

Extra details

In case you require further information, please do not hesitate to ask for it.

Thanks in advanced!

fmauch commented 2 years ago

First of all, thank you for posting this.

Basically, what I think is happening here:

I hope, this explanation helps and we can address this properly, soon.

aatb-ch commented 2 years ago

As mentioned on the forum, I also hit this issue when running the driver in a VM. On my side 20.04 and Noetic over a gigabit wired switch, no chance to get a stable connection in a VM (Parallels on Mac), but runs for days with zero hiccups on a real linux machine. I have not inquired further and suspected that it might be something to do with the VM not being responsive enough compared to native hardware, but this prevents me from developing with my laptop while travelling which is quite inconvenient, since I would ideally be able to run ROS in a VM alongside URSim, which is at the moment not possible (at least on my config).

I am not sure what is the error about the "maximum number of clients (1) already connected", could this point rather more to an issue with the network driver of the VM environment?

I have no time to dig further until the end of next week but will happily run more tests after to see if I can isolate the issue to something more specific.

aatb-ch commented 2 years ago

Maybe a note, our robots are UR10 with CB3.0 running 3.15, the linux machines dont have a RT kernel.

DanielBilbao12 commented 2 years ago

Hello again!

First of all, thank you for your quick response!

I think the problem is related to the number of poses being sent from the vision system node to the control system node because when the frequency of frames generated by the vision system increases, the connection drops earlier. Could it be because of the topic queue? I have set this to be 1 so that the pose sent by this node is the last one detected by the vision system and so the robot only moves to the final position and not to all the positions that the aruco passes through before being moved through the workspace.

Thanks in advanced, Daniel.

fmauch commented 2 years ago

I would expect the problem to correlate with the system load actually going on on the system. I'll be in the lab tomorrow where I will have a look at a quick fix I did last year to a similar problem. I thought, I did the setKeepaliveCount thing, but I might remember it wrong.

aatb-ch commented 2 years ago

I have been doing more testing, we are currently deploying 2 UR10s, which we ideally want to drive off a single Jetson Nano for ease of deloyment, instead of having multiple machines.

I currently am able to run 2 nodes of the ur_robot_driver from the Jetson, and it seems to work well, with both processes hovering around 15% cpu load, but I can still see the occasional dropped connection, about once per hour:

[ INFO] [1644150281.232665034]: Robot requested program
[ INFO] [1644150281.232870502]: Sent program to robot
[ INFO] [1644150281.404626379]: Robot connected to reverse interface. Ready to receive control commands.
[ INFO] [1644150429.325002328]: Connection to reverse interface dropped.
[ERROR] [1644150429.325689976]: Sending data through socket failed.
[ INFO] [1644150430.317166557]: Robot connected to reverse interface. Ready to receive control commands.
[ERROR] [1644151231.750051161]: Unexpected error: No trajectory defined at current time. Please contact the package maintainer.
[ INFO] [1644151490.761816208]: Connection to reverse interface dropped.
[ INFO] [1644151491.754148939]: Robot connected to reverse interface. Ready to receive control commands.

I have a hunch that this is indeed system load-related, if I am doing other things like installing a package or loading a chrome tab i can see it happening more often. What is the setKeepaliveCount trick you are mentionning? For my use-case I can allow a pretty loose timeout since what I care about most is stability over time.

Edit: might be the obvious thinking but, I have avoided having to mess with RT kernels up till now, wouldn't this be the solution? I have little time left to fiddle with this but if there is a dramatic difference in behavior with/without a RT kernel I might put the time needed to upgrade the machine.

aatb-ch commented 2 years ago

Quick note, I went ahead and compiled a RT kernel for the Jetson Nano and have not seen any dropped connection since then, so it very likely is due to occasional jitter which just gets worse when system load goes up.

I still have these Unexpected error: No trajectory defined at current time. Please contact the package maintainer. messages recurring but these are likely an unrelated issue, though I'd prefer to figure out what is causing them (I am publishing trajectory messages from Rosbridge websocket server to the Topic interface at about 10Hz).

aatb-ch commented 2 years ago

Just wanted to add some further notes: running 20.04 with RT kernel on a Raspberry Pi 4, but connected to URSim is also very stable with only very occasional dropped connection, at least rare enough to be useable. So the issue is really about running the ur_robot_driver node in a VM.

prasuchit commented 2 years ago

Hi,

I'm facing the exact same problem with UR3e right now. Has anyone found a good fix for this? I just pulled all the latest changes from the UR ros driver package and updated the polyscope on the teach pendant to the most recent version.

I installed all the dependencies correctly using rosdep install -r --from-paths . --ignore-src --rosdistro $ROS_DISTRO -y -i --verbose, did an update and upgrade after, recompiled my ws and rebooted both the robot and the PC.

I haven't faced this issue till now and it suddenly started happening today. I'm not sure what changed. I checked and made sure the robot is connected to the network on the teach pendant and my ros driver is connecting to the correct ip address.

Kindly help! @fmauch

captain-yoshi commented 2 years ago

It might not be your case, but I had forgotten to update the Universal_Robots_Client_Library and had systematic connection drops. Was resolved after the update.

niomate commented 2 years ago
  • As a workaround (I do see the problem here and this should be fixed properly), you could try modifying the driver by setting a higher keepalive counter. This can be done by adding ur_driver_->setKeepaliveCount(5); after this line

Since the file seems to have changed since this comment, could you please update the link or share after which line the setKeepAliveCount is supposed to be set @fmauch ?

Or is there a new workaround for this issue ?

fmauch commented 2 years ago

Sorry I didn't use a permalink. You can insert that code after this line: https://github.com/UniversalRobots/Universal_Robots_ROS_Driver/blob/222d67aad0bfb4f35e7acc32d34b4fb74710c0bd/ur_robot_driver/src/hardware_interface.cpp#L314-L315

niomate commented 2 years ago

Thanks, much appreciated !

alexandrosnic commented 2 years ago

And where would this line be added for the ROS 2 drivers @fmauch ? I have the same problem with Foxy

fmauch commented 2 years ago

@alexandrosnic Please see https://github.com/UniversalRobots/Universal_Robots_ROS2_Driver/issues/534

jsbyysheng commented 2 years ago

Sorry I didn't use a permalink. You can insert that code after this line:

https://github.com/UniversalRobots/Universal_Robots_ROS_Driver/blob/222d67aad0bfb4f35e7acc32d34b4fb74710c0bd/ur_robot_driver/src/hardware_interface.cpp#L314-L315

It makes UR Simulator (Vmware / Physical Machine) work well ! I suggest adding a parameter like simulator: =True or alive_count: =5 for the ros driver launch file.

fmauch commented 2 years ago

@jsbyysheng A more descriptive name of what the parameter actually does, such as max_missed_packages or something like that would maybe be more suitable. The keepalive_count IMHO makes sense internally, but for driver users, this probably isn't self explanatory.

However, when we introduce such a parameter, this would have to be well-documented so users are aware of the implications following from that.

jsbyysheng commented 2 years ago

@jsbyysheng A more descriptive name of what the parameter actually does, such as max_missed_packages or something like that would maybe be more suitable. The keepalive_count IMHO makes sense internally, but for driver users, this probably isn't self explanatory.

However, when we introduce such a parameter, this would have to be well-documented so users are aware of the implications following from that.

In fact, this parameter makes more sense for URSim than a real robot. It is not necessary for URSim to keep RT communication. A simple launch parameter named using_ursim:=true with ur_driver_->setKeepaliveCount(5); fixed to 5 is totally enough. A count number of more than 5 may make no sense because the network environment is unstable for any communication.

JaisonJose241 commented 10 months ago

one of the problem might be the hardware which is used to connect the system.

zp2546265641 commented 4 months ago

hello, i just meet such problem too.My computer is Ubuntu 20.04, ROS-Noetic, dual system, and 5.15.0-113Generic kernel. After I enabled urcap in the teach pendant, I established a connection with the real robot through the driver. I use the control cabinet io output to control the external air pump and drive ur5 to execute the trajectory planned by moveit at the same time, but the problem occurs. Once the io signal and the planned trajectory are running at the same time, urcap will be interrupted and the sending signal will prompt failure: [ INFO] [1720767234.769926758]: Completed trajectory execution with status PREEMPTED ... [ INFO] [1720767234.772389043]: Execution completed: PREEMPTED [ INFO] [1720767237.051394741]: Robot requested program [ INFO] [1720767237.051487070]: Sent program to robot [ INFO] [1720767237.401967035]: Robot connected to reverse interface. Ready to receive control commands. [ INFO] [1720767257.018619063]: Connection to reverse interface dropped. But I don't use io signals, and the execution of the track alone can be successful, but sometimes it will be interrupted. I don't know where the problem lies?

aatb-ch commented 4 months ago

first thing: if you dont use a RT kernel you'll get constant drops of connection, so you should really fix that first.

second: i am now on ROS2 but I've seen similar issue, you mention executing a trajectory while toggling an output. I've observed the same: if using the set_io service while the robot is executing a trajectory it would drop the connection half the time, forcing to reconnect. I'm thinking that the set_io call somehow blocks the controller long-enough to timeout the connection.. didnt have time to dig into it any further, had to find a quick workaround.

zp2546265641 commented 4 months ago

first thing: if you dont use a RT kernel you'll get constant drops of connection, so you should really fix that first.

second: i am now on ROS2 but I've seen similar issue, you mention executing a trajectory while toggling an output. I've observed the same: if using the set_io service while the robot is executing a trajectory it would drop the connection half the time, forcing to reconnect. I'm thinking that the set_io call somehow blocks the controller long-enough to timeout the connection.. didnt have time to dig into it any further, had to find a quick workaround.

hello, do you know which RL-kernel version should i choose for nvidia driver 550.54.14. i am now fix the kernel and the driver, but i meet new problem. Building module: cleaning build area... unset ARCH; [ ! -h /usr/bin/cc ] && export CC=/usr/bin/gcc; env NV_VERBOSE=1 'make' -j16 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.15.76-rt53 IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 SYSSRC=/lib/modules/5.15.76-rt53/build LD=/usr/bin/ld.bfd modules...(bad exit status: 2) ERROR (dkms apport): kernel package linux-headers-5.15.76-rt53 is not supported Error! Bad return status for module build on kernel: 5.15.76-rt53 (x86_64) Consult /var/lib/dkms/nvidia/550.54.14/build/make.log for more information. can you give me some advice?

aatb-ch commented 4 months ago

dont think I can help much.. i've been using a lot the prebuilt raspi ros2 images here https://github.com/ros-realtime/ros-realtime-rpi4-image but that wont be of much help for you i think

zp2546265641 commented 4 months ago

dont think I can help much.. i've been using a lot the prebuilt raspi ros2 images here https://github.com/ros-realtime/ros-realtime-rpi4-image but that wont be of much help for you i think

Thank you so much for your help!i will continue to think about it.

zp2546265641 commented 4 months ago

dont think I can help much.. i've been using a lot the prebuilt raspi ros2 images here https://github.com/ros-realtime/ros-realtime-rpi4-image but that wont be of much help for you i think i have installed 5.15.76-rt53, but it didn't work as expected. It still reported an error, but this time it was a little different: [ INFO] [1720873969.508932034]: Robot connected to reverse interface. Ready to receive control commands. [ INFO] [1720873975.622664768]: Received request to compute Cartesian path [ INFO] [1720873975.624387838]: Attempting to follow 31139 waypoints for link 'tool_ros' using a step of 0.010000 m and jump threshold 0.000000 (in global reference frame) [ INFO] [1720873976.552106491]: Computed Cartesian path with 31145 points (followed 100.000000% of requested trajectory) [ERROR] [1720873977.292264538]: Sending data through socket failed. [ INFO] [1720873977.292354495]: Connection to reverse interface dropped. [ INFO] [1720873977.372595596]: Execution request received [ERROR] [1720873977.400574305]: Can't accept new action goals. Controller is not running. [ WARN] [1720873977.400910620]: Controller 'scaled_pos_joint_traj_controller' failed with error INVALID_GOAL: [ WARN] [1720873977.401011718]: Controller handle scaled_pos_joint_traj_controller reports status FAILED [ INFO] [1720873977.401071288]: Completed trajectory execution with status FAILED ... [ INFO] [1720873977.402529277]: Execution completed: FAILED

zp2546265641 commented 4 months ago

i have installed 5.15.76-rt53, but it didn't work as expected. It still reported an error, but this time it was a little different: [ INFO] [1720873969.508932034]: Robot connected to reverse interface. Ready to receive control commands. [ INFO] [1720873975.622664768]: Received request to compute Cartesian path [ INFO] [1720873975.624387838]: Attempting to follow 31139 waypoints for link 'tool_ros' using a step of 0.010000 m and jump threshold 0.000000 (in global reference frame) [ INFO] [1720873976.552106491]: Computed Cartesian path with 31145 points (followed 100.000000% of requested trajectory) [ERROR] [1720873977.292264538]: Sending data through socket failed. [ INFO] [1720873977.292354495]: Connection to reverse interface dropped. [ INFO] [1720873977.372595596]: Execution request received [ERROR] [1720873977.400574305]: Can't accept new action goals. Controller is not running. [ WARN] [1720873977.400910620]: Controller 'scaled_pos_joint_traj_controller' failed with error INVALID_GOAL: [ WARN] [1720873977.401011718]: Controller handle scaled_pos_joint_traj_controller reports status FAILED [ INFO] [1720873977.401071288]: Completed trajectory execution with status FAILED ... [ INFO] [1720873977.402529277]: Execution completed: FAILED

i have installed 5.15.76-rt53, but it didn't work as expected. It still reported an error, but this time it was a little different: [ INFO] [1720873969.508932034]: Robot connected to reverse interface. Ready to receive control commands. [ INFO] [1720873975.622664768]: Received request to compute Cartesian path [ INFO] [1720873975.624387838]: Attempting to follow 31139 waypoints for link 'tool_ros' using a step of 0.010000 m and jump threshold 0.000000 (in global reference frame) [ INFO] [1720873976.552106491]: Computed Cartesian path with 31145 points (followed 100.000000% of requested trajectory) [ERROR] [1720873977.292264538]: Sending data through socket failed. [ INFO] [1720873977.292354495]: Connection to reverse interface dropped. [ INFO] [1720873977.372595596]: Execution request received [ERROR] [1720873977.400574305]: Can't accept new action goals. Controller is not running. [ WARN] [1720873977.400910620]: Controller 'scaled_pos_joint_traj_controller' failed with error INVALID_GOAL: [ WARN] [1720873977.401011718]: Controller handle scaled_pos_joint_traj_controller reports status FAILED [ INFO] [1720873977.401071288]: Completed trajectory execution with status FAILED ... [ INFO] [1720873977.402529277]: Execution completed: FAILED

zp2546265641 commented 4 months ago

20:40:36.536769 IP 192.168.0.10.60435 > zp-ros.50002: Flags [.], ack 14481, win 455, options [nop,nop,TS val 12400012 ecr 119005074], length 0 20:40:36.536773 IP 192.168.0.10.60435 > zp-ros.50002: Flags [.], ack 23169, win 591, options [nop,nop,TS val 12400012 ecr 119005074], length 0 20:40:36.536774 IP 192.168.0.10.60435 > zp-ros.50002: Flags [.], ack 25312, win 624, options [nop,nop,TS val 12400012 ecr 119005075], length 0 20:40:36.623828 IP 192.168.0.10.60435 > zp-ros.50002: Flags [F.], seq 17, ack 25312, win 624, options [nop,nop,TS val 12400099 ecr 119005075], length 0 20:40:36.623910 IP zp-ros.50002 > 192.168.0.10.60435: Flags [F.], seq 25312, ack 18, win 509, options [nop,nop,TS val 119005162 ecr 12400099], length 0 20:40:36.624020 IP 192.168.0.10.60435 > zp-ros.50002: Flags [.], ack 25313, win 624, options [nop,nop,TS val 12400099 ecr 119005162], length 0 When I listen to port 5002, I start urcap before 20:41, and it does not update the message after running the script