Factor-Robotics / odrive_ros2_control

ODrive driver for ros2_control
Apache License 2.0
250 stars 82 forks source link

Could not contact service /controller_manager/list_controllers #2

Closed Richard-Haes-Ellis closed 2 years ago

Richard-Haes-Ellis commented 2 years ago

I'm getting some weird errors while launching, errors mention "Could not contact service /controller_manager/list_controllers".

➜  ros2_ws ros2 launch odrive_bringup odrive.launch.py              
[INFO] [launch]: All log files can be found below /home/richard/.ros/log/2021-09-15-15-08-53-286219-richard-GL65-9SEK-10069
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [ros2_control_node-1]: process started with pid [10073]
[INFO] [robot_state_publisher-2]: process started with pid [10075]
[INFO] [spawner.py-3]: process started with pid [10077]
[INFO] [spawner.py-4]: process started with pid [10079]
[robot_state_publisher-2] Parsing robot urdf xml string.
[robot_state_publisher-2] Link link0 had 0 children
[robot_state_publisher-2] [INFO] [1631711333.470852643] [robot_state_publisher]: got segment link0
[robot_state_publisher-2] [INFO] [1631711333.470911564] [robot_state_publisher]: got segment world
[spawner.py-3] Traceback (most recent call last):
[spawner.py-3]   File "/opt/ros/foxy/lib/controller_manager/spawner.py", line 186, in <module>
[spawner.py-3]     sys.exit(main())
[spawner.py-3]   File "/opt/ros/foxy/lib/controller_manager/spawner.py", line 109, in main
[spawner.py-3]     if is_controller_loaded(node, controller_manager_name, controller_name):
[spawner.py-3]   File "/opt/ros/foxy/lib/controller_manager/spawner.py", line 51, in is_controller_loaded
[spawner.py-3]     controllers = list_controllers(node, controller_manager).controller
[spawner.py-3]   File "/opt/ros/foxy/lib/python3.8/site-packages/controller_manager/controller_manager_services.py", line 49, in list_controllers
[spawner.py-3]     return service_caller(node, f'{controller_manager_name}/list_controllers',
[spawner.py-3]   File "/opt/ros/foxy/lib/python3.8/site-packages/controller_manager/controller_manager_services.py", line 29, in service_caller
[spawner.py-3]     raise RuntimeError(f'Could not contact service {service_name}')
[spawner.py-3] RuntimeError: Could not contact service /controller_manager/list_controllers
[spawner.py-4] Traceback (most recent call last):
[spawner.py-4]   File "/opt/ros/foxy/lib/controller_manager/spawner.py", line 186, in <module>
[spawner.py-4]     sys.exit(main())
[spawner.py-4]   File "/opt/ros/foxy/lib/controller_manager/spawner.py", line 109, in main
[spawner.py-4]     if is_controller_loaded(node, controller_manager_name, controller_name):
[spawner.py-4]   File "/opt/ros/foxy/lib/controller_manager/spawner.py", line 51, in is_controller_loaded
[spawner.py-4]     controllers = list_controllers(node, controller_manager).controller
[spawner.py-4]   File "/opt/ros/foxy/lib/python3.8/site-packages/controller_manager/controller_manager_services.py", line 49, in list_controllers
[spawner.py-4]     return service_caller(node, f'{controller_manager_name}/list_controllers',
[spawner.py-4]   File "/opt/ros/foxy/lib/python3.8/site-packages/controller_manager/controller_manager_services.py", line 29, in service_caller
[spawner.py-4]     raise RuntimeError(f'Could not contact service {service_name}')
[spawner.py-4] RuntimeError: Could not contact service /controller_manager/list_controllers
[ERROR] [spawner.py-4]: process has died [pid 10079, exit code 1, cmd '/opt/ros/foxy/lib/controller_manager/spawner.py joint0_velocity_controller -c /controller_manager --ros-args'].
[ERROR] [spawner.py-3]: process has died [pid 10077, exit code 1, cmd '/opt/ros/foxy/lib/controller_manager/spawner.py joint_state_broadcaster --controller-manager /controller_manager --ros-args'].

Looking at discussions about this error I found a discussion talking about the stability of the spawn.py of the controller_manager ros-controls/ros2_control#475

They mention it could be because it's running on a slower PC but I'm running the same code on raspberry pi and my laptop PC with powerful specs that shouldn't be a problem and yet both platforms yield the same result.

I've also tried the fix that they proposed by incrementing the wait time but it just won't work. Any idea?

Richard-Haes-Ellis commented 2 years ago

I have found that the ODrive does not finish configuring at all and hangs on this line https://github.com/Factor-Robotics/odrive_ros2_control/blob/6cd7acf84d4ba3680e79bd386781661ff8fdcc5d/odrive_hardware_interface/src/odrive_hardware_interface.cpp#L83 not sure exactly where it goes wrong in ODriveUSB::init(); function, perhaps it has something to do with my libusb library installed in my system.

kallaspriit commented 2 years ago

What version of libusb are you using? For me this libusb-1.0-0-dev worked on Ubuntu 20.04.

sudo apt-get install libusb-1.0-0-dev

Makekihe commented 2 years ago

I'm experiencing the exact same issue as @Richard-Haes-Ellis presents in his original post. Any updated on this? I've used the following commands, before reaching this problem:

git clone https://github.com/Factor-Robotics/odrive_ros2_control.git git clone -b foxy https://github.com/ros-controls/ros2_control.git cd .. colcon build . install/setup.bash ros2 launch odrive_bringup odrive.launch.py

I've also installed the recommended version of libusb, as proposed by @kallaspriit, though without any luck.

kallaspriit commented 2 years ago

Have you tried running any of the https://github.com/ros-controls/ros2_control_demos and they work fine? Is odrivetool working fine on the test device (I have to use sudo odrivetool for some reason)? You're on Ubuntu 20.04 as well right?

Makekihe commented 2 years ago

I haven't tried any of the demos which you're referring to here. As I'm running this remotely (ssh) onto a RPi, then I, at a first glance, don't think it's possible to run the demo's straight out of the box (given that I have no screen connected to the RPi) - but I might be wrong though?

The odrivetool is working just fine, and yes I'm running Ubuntu 20.04 and ROS2 Foxy.

Richard-Haes-Ellis commented 2 years ago

The ros2_control_demos works fine me. I found that the problem originates from the the hardware_interface, the ODrive fails to initialize and hangs on the this linehttps://github.com/Factor-Robotics/odrive_ros2_control/blob/6cd7acf84d4ba3680e79bd386781661ff8fdcc5d/odrive_hardware_interface/src/odrive_usb.cpp#L84

This ether blocks processing for the controller_manager or doesn't allow it to start up correctly.

This is my setup:

Ubuntu 20.04.3 LTS ros2 foxy libusb-1.0-0-dev odrivetool version 0.5.3 ODrive firmware 0.5.3 hardware 3.6 56V variant

The usb library was already installed, using the original cable and I'm able to run odrivetool normally without sudo. I get the same results running on a freshly installed ubuntu server image on raspebrry py 4 8gb variant.

kallaspriit commented 2 years ago

I have all the same versions on Raspberry Pi 4B Ubuntu server 20.04 except for Odrive hardware (using Odrive 3.6 56V). Not sure if this changes anything.

Does the usb device have correct permissions?

ubuntu@rosbot:~$ lsusb | grep ODrive
Bus 001 Device 003: ID 1209:0d32 Generic ODrive Robotics ODrive v3
ubuntu@rosbot:~$ ls -lah /dev/bus/usb/001/003
crw-rw-rw- 1 root root 189, 2 Sep 20 12:20 /dev/bus/usb/001/003
Richard-Haes-Ellis commented 2 years ago

I get the same result has you, I have hw version 3.6 not 3.5 so all is the same.

➜  ~ lsusb | grep ODrive         
Bus 001 Device 007: ID 1209:0d32 Generic ODrive Robotics ODrive v3
➜  ~ ls -lah /dev/bus/usb/001/007
crw-rw-rw- 1 root root 189, 6 sep 20 15:31 /dev/bus/usb/001/007
Richard-Haes-Ellis commented 2 years ago

Digging a bit further I found that it never goes past this line: https://github.com/Factor-Robotics/odrive_ros2_control/blob/6cd7acf84d4ba3680e79bd386781661ff8fdcc5d/odrive_hardware_interface/src/odrive_usb.cpp#L166

I printed the arguments before hand so see if its passing anything strange and this is what I got:

[ros2_control_node-1] ODrive handle 0x55c3decfd540
[ros2_control_node-1] ODrive enpoint 131
[ros2_control_node-1] ODrive response 
[ros2_control_node-1] ODrive max packet size 16
[ros2_control_node-1] ODrive transfered 8

Not sure if that's of any help. Could it be the driver?

borongyuan commented 2 years ago

Hi, sorry for the late reply, I went on holiday last week. As mentioned by @kallaspriit , libusb-1.0-0-dev is needed. It seems you are using firmware 0.5.3, but we currently only support firmware 0.5.1. The protocal header file odrive_endpoints.hpp is generated using "odrivetool generate-code" for 0.5.1. This feature is broken in 0.5.2 and removed from 0.5.3 (https://github.com/odriverobotics/ODrive/issues/593). So we will update the code generation method to support later versions.

Richard-Haes-Ellis commented 2 years ago

Okey it works, I downgraded to firmware 0.5.1 and works first try, I wasn't aware there was such a big difference in comunication protocol between minor versions. Thanks!!

borongyuan commented 2 years ago

I have added support for firmware 0.5.3

matdmiller commented 2 years ago

Do you know if there are additional breaking changes for 0.5.4 and if so are you able to update the library for that version? I am running into the same issues and have tried everything in this thread with no luck yet. The only think I know is different is I'm running 0.5.4 firmware on the my Odrive. I tried to downgrade it to 0.5.3 to try that last night, but my Odrive was purchased from MKS Ali Express and odrivetool dfu blocks firmware changes on non-official boards, and I was not able to find a way around it.

Current Setup: RPi 4 - 4GB Ubuntu 20.04 ROS2 Foxy Debian Install Odrive HW Version 3.6-56V - Firmware 0.5.4 odrivetool version 0.5.4-post0 libusb-1.0-0-dev

I am able to utilize odrivetool and the odrive python package with no issues. I have done a full calibration using python scripts I put together. I have downloaded and tested the ros_controller demos and they all work just fine. I have added a number of print statements in the python launch script and it appears to be fully getting through to the end as verified by seeing those print statements when I call the launch script. I have also added a number of print statements to the C++ libraries, but none of them are printed when I run the launch script. I have also verified the usb permissions as shown above.

When I use odrivetool, the device name is dev0 instead of odrv0 because it is not a recognized serial number. Odrivetool does recognize the correct board version though.

Any help or suggestions would be greatly appreciated.

borongyuan commented 2 years ago

Currently every revision of ODrive firmware has protocal changes. Although the change from 0.5.3 to 0.5.4 is tiny, it's still not compatible. https://github.com/odriverobotics/ODrive/releases/tag/fw-v0.5.4

We may find a better implementation to support different firmware versions later. Currently you can downgrade to 0.5.2 or 0.5.3 using ST-LINK. DFU may not work. https://docs.odriverobotics.com/v/latest/developer-guide.html#flashing-with-an-stlink

It's fine if your ODrive is recognized as dev0 instead of odrv0. There are indeed some MODs for ODrive in China. Some of them have only minor hardware upgrades (like capacitors , mosfets or gate drivers), while others have big changes (such as single-drive versions). Most of them are compatible as long as there are no protocol modifications.

maxpolzin commented 2 years ago

I get the same error as you @matdmiller . After downgrading to firmware 0.53, it seems to work flawless.

FrankBu0616 commented 1 year ago

Thank you for the fantastic package. Just out of curiosity, what was the reason that made the package incompatible with 0.5.4? If I understand the previous discussion correctly, the issue was the protocol header file. Is it because some endpoints are changed?

borongyuan commented 1 year ago

Thank you for the fantastic package. Just out of curiosity, what was the reason that made the package incompatible with 0.5.4? If I understand the previous discussion correctly, the issue was the protocol header file. Is it because some endpoints are changed?

Yes, there are only minor differences as far as I remember, but that would also break compatibility. I want to skip 0.5.4, and will add support for 0.5.5 and CAN communication later, after I finish the project at hand.