UniversalRobots / Universal_Robots_ROS_Driver

Universal Robots ROS driver supporting CB3 and e-Series
Apache License 2.0
767 stars 405 forks source link

Dashboard Service disconnects randomly #614

Open shuobh opened 1 year ago

shuobh commented 1 year ago

Summary

The robot runs normally and suddenly the dashboard service get disconnected. The ur driver topics are still working.

Versions

Impact

Stability.

Issue details

The robot was running properly and suddenly reports the error below and dashboard services are down.

Attempt to write on a non-connected socket
Jan 23 12:28:21 nuc-25-robot-20 bluehill-start[3892303]: [ERROR] [1674505701.872766034]: Exception thrown while processing service call: Did not receive answer from dashboard server in time. Disconnecting from dashboard server.(Configured timeout: 1 sec)
Jan 23 12:28:21 nuc-25-robot-20 bluehill-start[3892485]: [ERROR] [1674505701.872849612]: Service call failed: service [/ur_hardware_interface/dashboard/play] responded with an error: Did not receive answer from dashboard server in time. Disconnecting from dashboard server.(Configured timeout: 1 sec)
Jan 23 12:28:21 nuc-25-robot-20 bluehill-start[3892303]: [ERROR] [1674505701.873099790]: Exception thrown while processing service call: Failed to send request to dashboard server. Are you connected to the Dashboard Server?
Jan 23 12:28:21 nuc-25-robot-20 bluehill-start[3892493]: [ERROR] [1674505701.873174276]: Service call failed: service [/ur_hardware_interface/dashboard/get_robot_mode] responded with an error: Failed to send request to dashboard server. Are you connected to the Dashboard Server?

At the same time, we could still get topics published from ur driver such as /joint_states and /ur_hardware_interface/robot_mode. Robot status seems normal by checking on the teach pendant. The issue goes away by restarting the driver.

Project status at point of discovered

This issue has been occurring randomly at around 1/day cadence. We discovered this right after merging with current master and it did not occur when we were running on commit c09807d79b5eb2e8b4d671995ec885ceeb48fc5d.

fmauch commented 1 year ago

Hi, thanks for reporting this. This seems like it could be hard to reproduce, but since you have a working commit version we can at least skim the source code changes since then. However, changes causing this (if it is a regression) could also come from the ur_client_library.

shuobh commented 1 year ago

Hi, thanks for reporting this. This seems like it could be hard to reproduce, but since you have a working commit version we can at least skim the source code changes since then. However, changes causing this (if it is a regression) could also come from the ur_client_library.

By running it more, we have noticed that even with commit commit https://github.com/UniversalRobots/Universal_Robots_ROS_Driver/commit/c09807d79b5eb2e8b4d671995ec885ceeb48fc5d, we still run into this issue. However, it happens much less frequent. It might be that we are using low-lantency kernel instead of real time and adding a new port 50004 added more traffic and made this more frequent? Are there any logs that I could collect if I ran into this issue again that can help debugging?

fmauch commented 1 year ago

If you're using a lowlatency kernel, you might be interested in #615.

shuobh commented 1 year ago

If you're using a lowlatency kernel, you might be interested in #615.

I've followed the update guide added the missing part but this issue remained. I will try to setup RT machine and test it out.

shuobh commented 1 year ago

If you're using a lowlatency kernel, you might be interested in #615.

I've followed the update guide added the missing part but this issue remained. I will try to setup RT machine and test it out.

We've also experienced this issue with the same setup.

Robot control is currently inactive. Starting controllers that claim resources is currently not possible. Not starting controller 'scaled_pos_joint_traj_controller'
Could not switch controllers. The hardware interface combination for the requested controllers is unfeasible.
shuobh commented 1 year ago

If you're using a lowlatency kernel, you might be interested in #615.

I've followed the update guide added the missing part but this issue remained. I will try to setup RT machine and test it out.

We've also experienced this issue with the same setup.

Robot control is currently inactive. Starting controllers that claim resources is currently not possible. Not starting controller 'scaled_pos_joint_traj_controller'
Could not switch controllers. The hardware interface combination for the requested controllers is unfeasible.

Interesting enough is that the error can be solved by running a dashboard service call "connect". Meaning that what actually happens is that the driver disconnects somehow, nothing more. I was expecting it to be some other issue. This normally happens when we switch between freedrive (urscript) and remote control. As for now, we will add a workaround to automatically run the connect service call when it disconnects.

Hytac commented 1 year ago

I've face this issue, the node disconnect itself after any service call fail and in my case it fails really often to load/play programs. So, what i do, is to manage that exception/faill and call dashboard connect and retry.