Kinovarobotics / ros2_kortex

ROS2 driver for the Gen3 Kinova robot arm
Other
51 stars 47 forks source link

timeout detected: BaseCyclicClient::RefreshFeedback #166

Closed skpawar1305 closed 1 year ago

skpawar1305 commented 1 year ago

we use Gen3 with Spot for robocup rescue league we used https://github.com/RRL-ALeRT/kinova_stuffs/tree/master/kortex_controller_py/kortex_controller_py (temporarily) everything worked good but to integrate with moveit, we're trying to use your package, I managed to get moveit working, but, 60% times, I get this error while launching the control node (I combined the urdf with spot, I don't think it should cause any issue, let me know if I'm wrong) and, 90% times I get this issue, if I'm already visualising /robot_description in rviz (another computer) surprisingly, visualising in rviz while the control node is running, also crashes the driver can you please take a look

[ros2_control_node-1] terminate called after throwing an instance of 'std::runtime_error' [ros2_control_node-1] what(): timeout detected: BaseCyclicClient::RefreshFeedback [ros2_control_node-1] [ros2_control_node-1] Stack trace (most recent call last) in thread 1193445: [ros2_control_node-1] #15 Object "", at 0xffffffffffffffff, in [ros2_control_node-1] #14 Source "../sysdeps/unix/sysv/linux/x86_64/clone3.S", line 81, in clone3 [0x7f90b51269ff] [ros2_control_node-1] #13 Source "./nptl/pthread_create.c", line 442, in start_thread [0x7f90b5094b42] [ros2_control_node-1] #12 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f90b54dc2b2, in [ros2_control_node-1] #11 Source "./src/ros2_control_node.cpp", line 82, in operator() [0x5557fa874856] [ros2_control_node-1] #10 Object "/opt/ros/humble/lib/libhardware_interface.so", at 0x7f90b5714f39, in hardware_interface::ResourceManager::write(rclcpp:: Time const&, rclcpp::Duration const&) [ros2_control_node-1] #9 Object "/opt/ros/humble/lib/libhardware_interface.so", at 0x7f90b573bb74, in hardware_interface::System::write(rclcpp::Time cons t&, rclcpp::Duration const&) [ros2_control_node-1] #8 Object "/home/max1/spot_ws/build/kortex_driver/libkortex_driver.so", at 0x7f90ad3ff753, in kortex_driver::KortexMultiInterfaceHa rdware::write(rclcpp::Time const&, rclcpp::Duration const&) [ros2_control_node-1] #7 Object "/home/max1/spot_ws/build/kortex_driver/libkortex_driver.so", at 0x7f90ad571443, in Kinova::Api::BaseCyclic::BaseCyclicCl ient::RefreshFeedback(unsigned int, Kinova::Api::RouterClientSendOptions const&) [ros2_control_node-1] #6 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f90b54ae517, in __cxa_throw [ros2_control_node-1] #5 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f90b54ae2b6, in std::terminate() [ros2_control_node-1] #4 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f90b54ae24b, in [ros2_control_node-1] #3 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f90b54a2bbd, in [ros2_control_node-1] #2 Source "./stdlib/abort.c", line 79, in abort [0x7f90b50287f2] [ros2_control_node-1] #1 Source "../sysdeps/posix/raise.c", line 26, in raise [0x7f90b5042475] [ros2_control_node-1] #0 | Source "./nptl/pthread_kill.c", line 89, in pthread_kill_internal [ros2_control_node-1] | Source "./nptl/pthread_kill.c", line 78, in __pthread_kill_implementation [ros2_control_node-1] Source "./nptl/pthread_kill.c", line 44, in __pthread_kill [0x7f90b5096a7c] [ros2_control_node-1] Aborted (Signal sent by tkill() 1193258 1000)

moriarty commented 1 year ago

Looks like you're using ROS 2 Humble?

Are you building this package from Source?

Are you building MoveIt 2 or ROS 2 Control from Source?

I know the of this package in ROS "main" has a bug but it is fixed and available in "ros-testing"

skpawar1305 commented 1 year ago

yes, ROS2 Humble, everything from apt (stable) forked repo: https://github.com/RRL-ALeRT/ros2_kortex for 6dof arm I'll build ros2_control from source (follow your instructions), and update, thank you for your attention

edit: although the frequency of the crash seem to be reduced, but it's not yet stable is ros-testing branch of this repo private? reducing this parameter from 1000, also improved it somewhat.. controller_manager: ros__parameters: update_rate: 200

moriarty commented 1 year ago

No ros-testing is a apt repo for testing pre-sync'd binaries it's public but it's use-at-your-own-risk

https://docs.ros.org/en/rolling/Installation/Testing.html https://docs.ros.org/en/iron/Installation/Testing.html https://docs.ros.org/en/humble/Installation/Testing.html

It's for package maintainers to and early adopters to do testing

image

image

0.2.0 is available on from ros 0.2.1 is available from ros-testing

0.2.0 was the first public release of this package, and it had a bug which is fixed in 0.2.1

(but if you build from source you won't have the bug, which is why I didn't notice it)

When they announce package syncs here: https://discourse.ros.org/c/release/16 that is when the packages from "ros-testing" are synced and they become available on "ros"

moriarty commented 1 year ago

I'm not sure if you really need to build ros2 control or moveit2 from source I was just curious, because you're using humble which is pretty far behind the latest developments going into rolling.

I did as much testing as possible with different mixes of software, Humble, Iron, Rolling, from apt and from source etc... but it is recommended by the MoveIt2 maintainers to run from source which is why I suggested it https://github.com/orgs/ros-planning/discussions/2190

We did see crashes but we were unable to reproduce them, you're able to reproduce this crash 60% of the time which is very interesting.

How is your system load? when running the only the kortex_driver what ros2 topic hz /joint_states are you getting?

skpawar1305 commented 1 year ago

I'm not sure if you really need to build ros2 control or moveit2 from source I was just curious, because you're using humble which is pretty far behind the latest developments going into rolling.

I did as much testing as possible with different mixes of software, Humble, Iron, Rolling, from apt and from source etc... but it is recommended by the MoveIt2 maintainers to run from source which is why I suggested it https://github.com/orgs/ros-planning/discussions/2190

We did see crashes but we were unable to reproduce them, you're able to reproduce this crash 60% of the time which is very interesting.

so, spot_driver also publishes to /joint_states, at around 30 hz, can this be an issue? I'll try to amalgamate joint_states at lower rate and publish, as it is being subscribed from a remote computer considering only manipulator - for update rate: 1000, average hz: 1000+ msgs publish to /joint_states

with ros-testing, no crash, but it crashes almost all the time, when I'm trying to visualize the robot_description? but, it doesn't crash if rviz is already on (reverse case from when I use combined urdf)

[ros2_control_node-1] terminate called after throwing an instance of 'std::runtime_error' [ros2_control_node-1] what(): timeout detected: BaseCyclicClient::RefreshFeedback [ros2_control_node-1] [ros2_control_node-1] Stack trace (most recent call last) in thread 74988: [ros2_control_node-1] #15 Object "", at 0xffffffffffffffff, in [ros2_control_node-1] #14 Source "../sysdeps/unix/sysv/linux/x86_64/clone3.S", line 81, in __clone3 [0x7f215a5269ff] [ros2_control_node-1] #13 Source "./nptl/pthread_create.c", line 442, in start_thread [0x7f215a494b42] [ros2_control_node-1] #12 Object "/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30", at 0x7f215a8dc252, in [ros2_control_node-1] #11 Source "./src/ros2_control_node.cpp", line 82, in operator() [0x561deb323856] [ros2_control_node-1] #10 Object "/opt/ros/humble/lib/libhardware_interface.so", at 0x7f215aaf8f39, in hardware_interface::ResourceManager::write(rclcpp::Time const&, rclcpp::Duration const&) [ros2_control_node-1] #9 Object "/opt/ros/humble/lib/libhardware_interface.so", at 0x7f215ab1fb74, in hardware_interface::System::write(rclcpp::Time const&, rclcpp:: Duration const&) [ros2_control_node-1] #8 Source "/home/max1/spot_ws/src/ros2_kortex/kortex_driver/src/hardware_interface.cpp", line 909, in write [0x7f214e3ff17e] [ros2_control_node-1] 906: { [ros2_control_node-1] 907: // this is needed when the robot was faulted [ros2_control_node-1] 908: // so we can internally conclude it is not faulted anymore [ros2_controlnode-1] > 909: feedback = basecyclic.RefreshFeedback(); [ros2_control_node-1] 910: } [ros2_control_node-1] 911: [ros2_control_node-1] 912: return return_type::OK;

How is your system load? when running the only the kortex_driver what ros2 topic hz /joint_states are you getting?

velodyne, octomap, 3 realsense cameras, spot driver, kinova vision, are running simultaneously, but plenty of cpu is still available

skpawar1305 commented 1 year ago

I was using wan port of the router, which was changed to work as lan, seems like attaching it to another lan port fixed the issue so, the issue was from my part I'll ask the network guy in our team if the port is slower, if you're interested. I'll close this issue as of now. Thank you

moriarty commented 1 year ago

Interesting. Thanks for debugging further.

There are a few other issues open which I haven't been able to reproduce but think they might be similar in not keeping up the desired communication rate.

FWIW: I did most testing running the driver on a dedicated machine running Debian (Intel NUC Extreme, not sure which year) with the Real-Time rt-preempt patch while running the rest of ROS 2 components on a laptop... but I also tested directly connecting the arm to a laptop not using the rt-preempt patch, usb-c dongle and 2023 System76 Serval WS... so neither system running the driver were under much load

Thanks again for looking into it further I will try to find some time to do further testing under load.