frankaemika / franka_ros

ROS integration for Franka research robots
https://frankaemika.github.io
Apache License 2.0
364 stars 314 forks source link

Launching franka_example_controllers examples (e.g. move_to_start.launch) #23

Closed aPonza closed 6 years ago

aPonza commented 6 years ago

Trying to move my Panda with franka_ros (installed version 0.5.0-0xenial-20180724-052826-0800 from aptitude) I wanted to start with some examples and go from there. After tutorializing myself with actionlib and moveit, I decided to try your examples. At first I got a missing dependency error from PyAssimp, which I upgraded with pip. Afterwards, I have been stuck with this console output:

ap@ap-HP-630-PC:~$ roslaunch franka_example_controllers move_to_start.launch robot_ip:=172.16.1.101 load_gripper:=true
... logging to /home/ap/.ros/log/f343ac20-958b-11e8-ad71-009c021669e3/roslaunch-ap-HP-630-PC-18801.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://ap-HP-630-PC:45795/

SUMMARY
========

PARAMETERS
 * /effort_joint_trajectory_controller/constraints/goal_time: 0.5

[...]

 * /robot_description_semantic: <?xml version="1....
 * /rosdistro: kinetic
 * /rosversion: 1.12.13

NODES
  /
    controller_spawner (controller_manager/spawner)
    franka_control (franka_control/franka_control_node)
    joint_state_publisher (joint_state_publisher/joint_state_publisher)
    move_group (moveit_ros_move_group/move_group)
    move_to_start (franka_example_controllers/move_to_start.py)
    robot_state_publisher (robot_state_publisher/robot_state_publisher)
    state_controller_spawner (controller_manager/spawner)

ROS_MASTER_URI=http://localhost:11311

process[franka_control-1]: started with pid [18824]
process[state_controller_spawner-2]: started with pid [18825]
process[robot_state_publisher-3]: started with pid [18826]
process[joint_state_publisher-4]: started with pid [18827]
process[controller_spawner-5]: started with pid [18830]
process[move_group-6]: started with pid [18831]
process[move_to_start-7]: started with pid [18832]
INFO: cannot create a symlink to latest log directory: [Errno 2] No such file or directory: '/home/ap/.ros/log/latest'
[INFO] [1533133292.965529]: Controller Spawner: Waiting for service controller_manager/load_controller
[INFO] [1533133293.008041]: Controller Spawner: Waiting for service controller_manager/load_controller
[ INFO] [1533133310.864421603]: Loading robot model 'panda'...
[ INFO] [1533133310.864582181]: No root/virtual joint specified in SRDF. Assuming fixed joint
[ INFO] [1533133311.434593432]: Loading robot model 'panda'...
[ INFO] [1533133311.434670318]: No root/virtual joint specified in SRDF. Assuming fixed joint
[ INFO] [1533133311.813252843]: Publishing maintained planning scene on 'monitored_planning_scene'
[ INFO] [1533133311.833453881]: MoveGroup debug mode is OFF
Starting context monitors...
[ INFO] [1533133311.964146375]: Starting scene monitor
[ INFO] [1533133311.969234875]: Listening to '/planning_scene'
[ INFO] [1533133311.969279904]: Starting world geometry monitor
[ INFO] [1533133311.974056849]: Listening to '/collision_object' using message notifier with target frame '/panda_link0 '
[ INFO] [1533133311.978706678]: Listening to '/planning_scene_world' for planning scene world geometry
[ INFO] [1533133311.992981585]: Listening to '/attached_collision_object' for attached collision objects
Context monitors started.
[ INFO] [1533133314.337135187]: Initializing OMPL interface using ROS parameters
[ INFO] [1533133314.375821047]: Using planning interface 'OMPL'
[ INFO] [1533133314.609941620]: Param 'default_workspace_bounds' was not set. Using default value: 10
[ INFO] [1533133314.610955241]: Param 'start_state_max_bounds_error' was set to 0.1
[ INFO] [1533133314.611712517]: Param 'start_state_max_dt' was not set. Using default value: 0.5
[ INFO] [1533133314.612581480]: Param 'start_state_max_dt' was not set. Using default value: 0.5
[ INFO] [1533133314.613295782]: Param 'jiggle_fraction' was set to 0.05
[ INFO] [1533133314.614061370]: Param 'max_sampling_attempts' was not set. Using default value: 100
[ INFO] [1533133314.614154158]: Using planning request adapter 'Add Time Parameterization'
[ INFO] [1533133314.614195411]: Using planning request adapter 'Fix Workspace Bounds'
[ INFO] [1533133314.614237036]: Using planning request adapter 'Fix Start State Bounds'
[ INFO] [1533133314.614278020]: Using planning request adapter 'Fix Start State In Collision'
[ INFO] [1533133314.614320265]: Using planning request adapter 'Fix Start State Path Constraints'
[ WARN] [1533133320.067844261]: Waiting for position_joint_trajectory_controller/follow_joint_trajectory to come up
[WARN] [1533133323.058336]: Controller Spawner couldn't find the expected controller_manager ROS interface.
[WARN] [1533133323.058337]: Controller Spawner couldn't find the expected controller_manager ROS interface.
[state_controller_spawner-2] process has finished cleanly
log file: /home/ap/.ros/log/f343ac20-958b-11e8-ad71-009c021669e3/state_controller_spawner-2*.log
[controller_spawner-5] process has finished cleanly
log file: /home/ap/.ros/log/f343ac20-958b-11e8-ad71-009c021669e3/controller_spawner-5*.log
[ WARN] [1533133326.068081795]: Waiting for position_joint_trajectory_controller/follow_joint_trajectory to come up
[ERROR] [1533133332.068512233]: Action client not connected: position_joint_trajectory_controller/follow_joint_trajectory
[ INFO] [1533133332.077124121]: Returned 0 controllers in list
[ INFO] [1533133332.097130356]: Trajectory execution is managing controllers
Loading 'move_group/ApplyPlanningSceneService'...
Loading 'move_group/ClearOctomapService'...
Loading 'move_group/MoveGroupCartesianPathService'...
Loading 'move_group/MoveGroupExecuteTrajectoryAction'...
Loading 'move_group/MoveGroupGetPlanningSceneService'...
Loading 'move_group/MoveGroupKinematicsService'...
Loading 'move_group/MoveGroupMoveAction'...
Loading 'move_group/MoveGroupPickPlaceAction'...
[ INFO] [1533133333.911268075]: Loading robot model 'panda'...
[ INFO] [1533133333.911336463]: No root/virtual joint specified in SRDF. Assuming fixed joint
Loading 'move_group/MoveGroupPlanService'...
Loading 'move_group/MoveGroupQueryPlannersService'...
Loading 'move_group/MoveGroupStateValidationService'...
[ INFO] [1533133334.262345647]: 

********************************************************
* MoveGroup using: 
*     - ApplyPlanningSceneService
*     - ClearOctomapService
*     - CartesianPathService
*     - ExecuteTrajectoryAction
*     - GetPlanningSceneService
*     - KinematicsService
*     - MoveAction
*     - PickPlaceAction
*     - MotionPlanService
*     - QueryPlannersService
*     - StateValidationService
********************************************************

[ INFO] [1533133334.262459784]: MoveGroup context using planning plugin ompl_interface/OMPLPlanner
[ INFO] [1533133334.262507076]: MoveGroup context initialization complete

You can start planning now!

[ INFO] [1533133334.464601255]: Loading robot model 'panda'...
[ INFO] [1533133334.464661115]: No root/virtual joint specified in SRDF. Assuming fixed joint
[ INFO] [1533133335.469617650]: Ready to take commands for planning group panda_arm.
[ INFO] [1533133335.482903448]: Combined planning and execution request received for MoveGroup action. Forwarding to planning and execution pipeline.
[ INFO] [1533133335.549885407]: Planning attempt 1 of at most 1
[ INFO] [1533133335.587650571]: Planner configuration 'panda_arm' will use planner 'geometric::RRTConnect'. Additional configuration parameters will be set when the planner is constructed.
[ INFO] [1533133335.720734117]: RRTConnect: Starting planning with 1 states already in datastructure
[ INFO] [1533133335.759458588]: RRTConnect: Created 4 states (2 start + 2 goal)
[ INFO] [1533133335.759551813]: Solution found in 0.039023 seconds
[ INFO] [1533133335.766524302]: SimpleSetup: Path simplification took 0.006710 seconds and changed from 3 to 2 states
[ INFO] [1533133335.788275683]: Returned 0 controllers in list
[ERROR] [1533133335.788402835]: Unable to identify any set of controllers that can actuate the specified joints: [ panda_joint1 panda_joint2 panda_joint3 panda_joint4 panda_joint5 panda_joint6 panda_joint7 ]
[ERROR] [1533133335.788452422]: Known controllers and their joints:

[ERROR] [1533133335.788538943]: Apparently trajectory initialization failed
[ INFO] [1533133335.789098467]: ABORTED: Solution found but controller failed during execution
================================================================================REQUIRED process [move_to_start-7] has died!
process has finished cleanly
log file: /home/ap/.ros/log/f343ac20-958b-11e8-ad71-009c021669e3/move_to_start-7*.log
Initiating shutdown!
================================================================================
[move_to_start-7] killing on exit
[move_group-6] killing on exit
[joint_state_publisher-4] killing on exit
[robot_state_publisher-3] killing on exit
[franka_control-1] killing on exit
[franka_control-1] escalating to SIGTERM
shutting down processing monitor...
... shutting down processing monitor complete
done

The almost exact same thing happens if I try to roslaunch franka_control franka_control.launch robot_ip:=172.16.1.101 load_gripper:=true, where the warning from Controller Spawner seems to kill the node.

I tried to rospack find controller_manager and I didn't find it, so I sudo apt-get install ros-kinetic-ros-control, but it didn't solve it. I checked both logs and they just repeat the same errors and warnings, I checked the launch files to try and see which things are not correctly loading and found where the exception from the warning is raised but the connected try block seems like it shouldn't give issues... Moreover, Panda has the external activation activated and the joints unlocked (worked for them). Environment variables seem on point, setup files are sourced.

Is it a namespace problem, where the controller is loaded but not where it should be? Or is it something else I don't yet understand? Should I build from source like suggested in this comment?

chriseichmann commented 6 years ago

Hi,

please try sudo apt-get install ros-kinetic-controller-manager

aPonza commented 6 years ago

"it's already the newest version (0.13.3-0xenial-20180426-054905-0800)"

chriseichmann commented 6 years ago

Earlier this week I had a similar problem. From the dpkg log I can read that I installed ros-kinetic-moveit-commander that day. Maybe this can help you. Sadly I don't have access to the robot right now to test it myself.

aPonza commented 6 years ago

Eh, unfortunately also this package is already up to date ("ros-kinetic-moveit-commander is already the newest version (0.9.12-1xenial-20180724-044717-0800)").

From looking into the code, I see controller_spawner is trying to load the controller with:

robot_namespace = ""   # weird?
load_controller_service = controller_manager/load_controller  # since robot namespace is empty...

and timing out on the line rospy.wait_for_service(...). So reading the launch file I see the problem lies in loading position_joint_trajectory_controller, which in fact I can't find yet. Trying a rosrun controller_manager load controller_manager position_joint_trajectory_controller gives the exact same result, so am I missing said controller?

aPonza commented 6 years ago

Google doesn't give me that specific controller but it does show "joint_trajectory_controller" which means I tried sudo apt-get install ros-kinetic-ros-controllers and still failed.

Googling kind of helped when I tried with only "position_joint_trajectory_controller" and it resulted in giving me this thread. The thread discusses a similar problem but with a simulated environment. I feel like there's a link between my issue and theirs, but still, the launch file of move_to_start finds on its own the panda_moveit_config files, which means it should be finding (if I go find /opt/ros/ -name panda_moveit_config -print the result is only that one) the one from the moveit tutorials themselves, which is the same "updated" code @christgau is talking about.

EDIT: I never used the ones from the git repository, I realized, I used the ones from aptitude and the franka_ros package. Thus I tried to get the git panda_moveit_config and build it, the result is now if I execute roslaunch franka_example_controllers move_to_start.launch robot_ip:=172.16.2.101 I get the error unused args [load_gripper] for include of [/home/ap/catkin_ws/src/panda_moveit_config/launch/move_group.launch], meaning the current launch file is incompatible without edits.

aPonza commented 6 years ago

I deleted the catkin_ws and made a brand new one to build from source both libfranka and franka_ros as per this. After also updating the firmware of the controller (I built libfranka version 3, controller was on 2), it seems like the problem with the controller seems solved, but Panda still doesn't want to move and gives me this error:

[ERROR] [1533279509.105783637]: libfranka: Move command rejected: command not possible in the current mode!

I tried already the normal procedure to no avail. After the error, MoveGroups are started, and then:

You can start planning now!

[ INFO] [1533280073.533212783]: Loading robot model 'panda'...
[ INFO] [1533280073.533296439]: No root/virtual joint specified in SRDF. Assuming fixed joint
[ INFO] [1533280073.664434618]: Loading robot model 'panda'...
[ INFO] [1533280073.664499532]: No root/virtual joint specified in SRDF. Assuming fixed joint
[ INFO] [1533280074.794115132]: Ready to take commands for planning group panda_arm.
[ INFO] [1533280074.811005253]: Combined planning and execution request received for MoveGroup action. Forwarding to planning and execution pipeline.
[ INFO] [1533280074.811424363]: Planning attempt 1 of at most 1
[ INFO] [1533280074.814041853]: Planner configuration 'panda_arm' will use planner 'geometric::RRTConnect'. Additional configuration parameters will be set when the planner is constructed.
[ INFO] [1533280074.815060771]: RRTConnect: Starting planning with 1 states already in datastructure
[ INFO] [1533280074.827086437]: RRTConnect: Created 5 states (2 start + 3 goal)
[ INFO] [1533280074.827190102]: Solution found in 0.012383 seconds
[ INFO] [1533280074.836727831]: SimpleSetup: Path simplification took 0.009322 seconds and changed from 4 to 2 states
> [ WARN] [1533280074.845116910]: Dropping first 1 trajectory point(s) out of 10, as they occur before the current time.
> First valid point will be reached in 0.428s.
> [ WARN] [1533280077.196332434]: Controller position_joint_trajectory_controller failed with error code GOAL_TOLERANCE_VIOLATED
> [ WARN] [1533280077.196555486]: Controller handle position_joint_trajectory_controller reports status ABORTED
[ INFO] [1533280077.277483551]: Completed trajectory execution with status ABORTED ...
[ INFO] [1533280077.439731474]: ABORTED: Solution found but controller failed during execution
================================================================================REQUIRED process [move_to_start-7] has died!
process has finished cleanly
log file: /home/ap/.ros/log/90eae6c8-96ea-11e8-b6c2-009c021669e3/move_to_start-7*.log
Initiating shutdown!
================================================================================
[move_to_start-7] killing on exit
[move_group-6] killing on exit
[controller_spawner-5] killing on exit
[joint_state_publisher-4] killing on exit
[robot_state_publisher-3] killing on exit
[state_controller_spawner-2] killing on exit
[franka_control-1] killing on exit
[INFO] [1533280078.606341]: Shutting down spawner. Stopping and unloading controllers...
[INFO] [1533280078.607032]: Stopping all controllers...
[INFO] [1533280078.608018]: Shutting down spawner. Stopping and unloading controllers...
[INFO] [1533280078.609073]: Stopping all controllers...
[WARN] [1533280078.966768]: Controller Spawner error while taking down controllers: unable to connect to service: connection from sender terminated before handshake header received. 0 bytes were received. Please check sender for additional details.
[WARN] [1533280078.985987]: Controller Spawner error while taking down controllers: unable to connect to service: [Errno 104] Connection reset by peer
shutting down processing monitor...
... shutting down processing monitor complete
done

So it looked like there's a slow connection or something. But the network load test returns:

--- 172.16.2.101 ping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 10068ms
rtt min/avg/max/mdev = 0.226/0.492/0.666/0.047 ms

...which means I'm under the milli and it should be fine? CPU is on performance mode as well.

EDIT: I ran echo_robot_state from libfranka examples since I now have it available and, after re-publishing

rostopic pub -1 /franka_control/error_recovery/goal franka_control/ErrorRecoveryActionGoal "{}"

I get :

{
"O_T_EE": [0.992893,0.0871198,-0.0809555,0,0.0871759,-0.996179,-0.00284745,0,-0.0808958,-0.00423023,-0.996714,0,0.524795,0.00722957,0.294793,1], 
"O_T_EE_d": [0.992913,0.0874344,-0.0803735,0,0.0874943,-0.996151,-0.00278373,0,-0.0803091,-0.00426831,-0.996761,0,0.524888,0.00738594,0.295042,1], 
"F_T_EE": [0.7071,-0.7071,0,0,0.7071,0.7071,0,0,0,0,1,0,0,0,0.1034,1], 
"EE_T_K": [1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1], 
"m_ee": 0.73, 
"F_x_Cee": [-0.01,0,0.03], 
"I_ee": [0.001,0,0,0,0.0025,0,0,0,0.0017], 
"m_load": 0, 
"F_x_Cload": [0,0,0], 
"I_load": [0,0,0,0,0,0,0,0,0], 
"m_total": 0.73, 
"F_x_Ctotal": [-0.01,0,0.03], 
"I_total": [0.001,0,0,0,0.0025,0,0,0,0.0017], 
"elbow": [0.008815,-1], 
"elbow_d": [0.0089711,-1], 
"tau_J": [0.397578,-29.7917,0.315179,22.0217,0.493995,2.21358,-0.0449972], 
"tau_J_d": [0,0,0,0,0,0,0], 
"dtau_J": [53.9998,-87.6941,27.9825,-21.1357,8.50055,8.72733,42.5273], 
"q": [0.00593191,0.0106994,0.008815,-2.1266,-0.0035399,2.05626,0.714524], 
"dq": [0.000317997,0.00111618,-0.000285925,0.000521088,-0.000269897,-0.000209827,-0.00152636], 
"q_d": [0.00607774,0.0103849,0.0089711,-2.12642,-0.00356418,2.05635,0.714521], 
"dq_d": [0,0,0,0,0,0,0], 
"joint_contact": [0,0,0,0,0,0,0], 
"cartesian_contact": [0,0,0,0,0,0], 
"joint_collision": [0,0,0,0,0,0,0], 
"cartesian_collision": [0,0,0,0,0,0], 
"tau_ext_hat_filtered": [0.336855,-0.339606,0.224505,0.304036,-0.158596,0.0804715,-0.0700333], 
"O_F_ext_hat_K": [0.047224,0.373813,0.587859,-0.303382,-0.324877,0.281773], 
"K_F_ext_hat_K": [0.0318651,-0.369949,-0.591328,-0.205633,0.0127268,-0.0695569], 
"O_dP_EE_d": [0,0,0,0,0,0], 
"theta": [0.00596015,0.00860947,0.00883637,-2.12505,-0.00348501,2.0565,0.714519], 
"dtheta": [0,0,0,0,0,0,0], 
"current_errors": [], 
"last_motion_errors": ["communication_constraints_violation"], 
"control_command_success_rate": 0, 
"robot_mode": "Reflex", 
"time": 2976499}
chriseichmann commented 6 years ago

Hi,

happy to see that building from source solved some problems.

Your robot seems to be in an error state and stopped due to a reflex (e.g. self collision). If you can't recover the error via /franka_control/error_recovery/ please try it via the Franka desk UI.

Most errors can be solved with calling /franka_control/error_recovery/goal. From my experience there are some errors where the web interface is needed for resolving them, like severe self collisions.

aPonza commented 6 years ago

Hi Chris,

I am in fact continually trying to recover from errors, with

rostopic pub -1 /franka_control/error_recovery/goal franka_control/ErrorRecoveryActionGoal "{}"

and I get in return (right after executing the command)

[ INFO] [1533290850.740965283]: Recovered from error
[ERROR] [1533290851.074941978]: libfranka: Move command aborted: motion aborted by reflex! ["communication_constraints_violation"]
control_command_success_rate: 0.43

I did change some cables (upgraded to a Cat. 6 ethernet cable between pc and control), and the result is that now it's slightly moving but immediately stopping due to a Cartesian Reflex.

Desk is not letting me recover from errors sometimes, and it is losing the gripper after some of the tests I'm doing, so I have to shutdown and restart at the moment. This happens frequently with a moveit example using rviz. It would help if there was a js command that forced Desk to perform error recovery even if it doesn't seem available.

By the way, no self collisions are happening, or collisions at all for that matter: the arm is barely moving at all.

For example, right now this happened:

Apart from this crazy event though, it seems like progress is being made, but I think I'm missing something crucial.

EDIT: As crazy as that sounds, this happens consistently even after rebooting. I had not noticed but the rviz tab is also warning me that: [ WARN] [1533295604.046500154]: Interactive marker 'EE:goal_panda_hand' contains unnormalized quaternions. This warning will only be output once but may be true for others; enable DEBUG messages for ros.rviz.quaternions to see more details.

I tried to call the service to set_logger_level "logger: 'ros.rviz.quaternions' level: 'debug'" and the get_loggers shows it's set correctly, but nothing shows up in the terminal tab.

chriseichmann commented 6 years ago

The unnormalized quaternions aren't the problem. The communication_constraints_violation is.

Can you move the robot arm and use the hand via Desk?

Also, lets try moveit without the arm

roscore
roslaunch franka_control franka_control.launch robot_ip:=172.16.2.101 load_gripper:=false
roslaunch panda_moveit_config panda_moveit.launch controller:=position load_gripper:=false
roslaunch panda_moveit_config moveit_rviz.launch
aPonza commented 6 years ago

Yes I can, I made a simple pick and place task that I run every time I change the hardware configuration, and so far no issues there.

MoveIt without the arm doesn't load without giving me the usual [ERROR] [1533301052.932503680]: libfranka: Move command aborted: motion aborted by reflex! ["communication_constraints_violation"] control_command_success_rate: 0.28 and rviz the warning on the unnormalized quaternions (on link8 this time...makes sense since no gripper was loaded).

While the warning happens while loading rviz, the error happens systematically but after a random amount of time: the milli threshold is crossed when all nodes are up but not necessarily right away.

In regards to the suggestions given by Franka Emika, I am 100% on N°1, 2, 3 (actually no, see next paragraph), and 5. N° 4 seems more like a statement than a thing I can change, I don't get it. For N°6 I tried robot.setFilters(50, 50, 50, 50, 50) right after robot is instantiated in franka_control_node, and it lead to execution errors. I don't really know where I would edit in the filters thresholds since I haven't yet written my own package and I'm just trying the examples.

I retried the bandwidth test with all the nodes active and the issue is right there:

--- 172.16.2.101 ping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 10251ms
rtt min/avg/max/mdev = 0.115/0.357/2.499/0.071 ms

This is after a cable change, so that's the reason why the min and avg are lowered. The max with all nodes up and running is significantly higher. I am however out of options, am I not? Is there something I'm missing?

chriseichmann commented 6 years ago

Some parts of libfranka are running with 1kHz and need to communicate with the FCI, therefore you need to be below 1ms rtt.

  1. Check if you are running a realtime linux kernel: uname -r should show rt as part of the output, e.g. 4.13.12-rt5
  2. disable CPU frequency scaling
  3. always connect your workstation PC directly to the FCI (without switches, routers, ... in between)
aPonza commented 6 years ago

If you're saying that with just the arm ready to move (e.g. brakes open ext.act. open, ready in Desk) and nothing else going on, I need a max rtt<1ms, I do have it:

$ sudo ping 172.16.2.101 -i 0.001 -D -c 10000 -s 1200

[...]

--- 172.16.2.101 ping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 10017ms
rtt min/avg/max/mdev = 0.150/0.412/0.543/0.047 ms

and then:

$ uname -r
4.14.12-rt10
$ cpufreq-info -p
933000 2533000 performance

Finally I only have a 2m cat. 6 RJ-45 cable separating this pc and 172.16.2.101.

Where else could the problem be? I will check if my issue has become a duplicate of this issue or this issue), otherwise I welcome any other suggestion.

sgabl commented 6 years ago

The 1ms constraints applies to more than just the RRT, you'll find more details here. The results of the Network bandwidth, delay and jitter test should not be significantly higher than rtt min/avg/max/mdev = 0.147/0.240/0.502/0.038 ms. What type of network card are you using? I've experienced a similar high avg ping time with an USB NIC.

Please try out the ping-test with an other PC and post the results.

aPonza commented 6 years ago

First, some data about my network setup:

$ lspci | awk '/[Nn]et/ {print $1}' | xargs -i% lspci -ks %
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
    DeviceName: Hanksville Gbe Lan Connection
    Subsystem: Hewlett-Packard Company RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
    Kernel driver in use: r8169
    Kernel modules: r8169
02:00.0 Network controller: Qualcomm Atheros AR9285 Wireless Network Adapter (PCI-Express) (rev 01)
    DeviceName: Atheros AR3011 Bluetooth 2.1+EDR  Adapter 
    Subsystem: Hewlett-Packard Company AR9285 Wireless Network Adapter (PCI-Express)
    Kernel driver in use: ath9k
    Kernel modules: ath9k

Ok, so the 1ms constraint is relative to the sum of: 1) Round trip time (RTT) between the workstation PC and FCI; 2) Execution time of your motion generator or control loop; 3) Time needed by the robot to process your data and step the internal controller.

Regarding 1, my worst RTT seems to be 0.543 ms. The average trip is much higher than yours, but the rest of the data doesn't seem too different. Let's see the rest. For 2, I see a [ INFO] [1533539326.993379637]: Solution found in 0.009804 seconds and a [ INFO] [1533539326.999867420]: SimpleSetup: Path simplification took 0.006222 seconds and changed from 4 to 2 states, so that seems to be already a problem. Moreover, the combo with 3 is also killing it, as the warning says [ WARN] [1533539327.026003833]: Dropping first 1 trajectory point(s) out of 10, as they occur before the current time. First valid point will be reached in 0.427s.

This puts the sum at 443,569 ms, if I'm not mistaken! My network setup is fine, it's the CPU that is much too slow in calculating a solution. Could you confirm this is the bottleneck and I didn't make mistakes in my assumptions and calculations? Could you maybe show me your output of these 3 lines, to confirm I need to buy and setup a more powerful pc?

EDIT: My CPU stats:

$ cat /proc/cpuinfo  | grep 'name'| uniq
model name  : Intel(R) Core(TM) i3 CPU       M 380  @ 2.53GHz
$ cat /proc/cpuinfo  | grep process| wc -l
4
aPonza commented 6 years ago

Trying to understand better the problem, I find I'm not getting unreasonable calculation times (point 2 of your 1ms list):

With the caveat of these being at least 1 year old, I'm bound to think I'm looking at the wrong numbers or misunderstanding something. I also found a real time motion planning algorithm from 2 years ago: is a real-time motion planning the one I'm supposed to use? But then it would not be visualizable in RViz/deal with obstacles (source), which the launch file is launching/the other simulation is planning around.

christgau commented 6 years ago

From my understanding, the time that MoveIt takes to create a motion plan does not count into the 1ms constraint mentioned before. The calculated (and simplified) plan is executed by the actual robot controller (or fake controller in case of simulation) and this component along with the involved network must not violate the 1ms constraint. Looking at the article and the graphics in https://frankaemika.github.io/docs/libfranka.html note that they deal with libfranka and MoveIt/franka_ros is only shown in the graphics for sake of a better understanding.

aPonza commented 6 years ago

I see what you mean, I can definitely disregard the 9.804+6.222ms. Do we have any means to see the time the controller is taking to execute the plan? It seems the 427ms could be a combination of 2+3, by what the warning says. Any ways I can act on that that I'm not seeing through node/topic/service exploration?

christgau commented 6 years ago

I am not aware of any means for instrumenting controllers. If you built everything from source you could enable debug symbols and attach perf to the appropriate processes and check what's going on. In addition you could add your own profiling code. But be aware: Adding code, especially I/O stuff (like printfs), can easily break the hard realtime requirements.

Anyways, when having access to the robot I do not observe the problems with the execution of trajectories as you do - at least after the robot configuration has been fixed;-)

aPonza commented 6 years ago

Franka Emika updated yesterday the libfranka version. After rebuilding and running the libfranka/examples/communication_test.cpp it seems clear (to franka emika support) that I need to upgrade my notebook's ethernet NIC as it seems my Realtek one is the fault. I'll update the issue once I have a solution in this sense.

aPonza commented 6 years ago

The hardware upgrade worked for the communications test, which now starts, (which is already an improvement) and returns:

Finished test, shutting down example

#######################################################
The control loop did not get executed 35 times in the
last 10000 milliseconds! (lost 35 robot states)

Control command success rate of 9965 samples: 
Max: 1.00
Avg: 0.97
Min: 0.81
#######################################################

However the lost states are never 0, and I found they range between 35 and 194 (based on some runs in 2 days).

Moreover, trying roslaunch franka_example_controllers move_to_start.launch robot_ip:=172.16.2.101 load_gripper:=false doesn't work and, again, gives problems with the controllers.

EDIT: added the error

[INFO] [1536678512.176718]: Controller Spawner: Waiting for service controller_manager/load_controller
[INFO] [1536678512.269210]: Controller Spawner: Waiting for service controller_manager/load_controller
[ WARN] [1536678517.079633385]: Waiting for position_joint_trajectory_controller/follow_joint_trajectory to come up
[ WARN] [1536678523.079815712]: Waiting for position_joint_trajectory_controller/follow_joint_trajectory to come up
[ERROR] [1536678529.080127158]: Action client not connected: position_joint_trajectory_controller/follow_joint_trajectory

I went directly with a source install of libfranka and franka_ros, and I installed everything I had installed following my own comments (ros-kinetic-ros-controllers, ros-kinetic-moveit-commander, ros-kinetic-moveit, ros-kinetic-ros-control, ros-kinetic-joint-state-controller, ros-kinetic-controller-manager, and even ros-kinetic-gazebo-ros-pkgs, ros-kinetic-gazebo-ros-control and ros-kinetic-control-msgs).

The only different thing was PyAssimp again was giving troubles but eventually it seemed to work in the same way as last time. This time I had it installed with pip (so I have version 4.1.3) but sudo apt-get install ros-kinetic-moveit-commander was still complaining and installing it via aptitude, so I patched that one (as per this) (Actually while writing the post I went and looked in the old pc and the configuration is the same, with both versions installed, except the one in the old pc isn't patched)

I searched the old pc for potential differences and installed all the packages that were installed over there. The only remaining thing is I have a newer kernel: I had 4.14.12, I have 4.16.18, and this is due to the fact that I need compatibility with the Intel RealSense D435, which doesn't support 4.14.xx; therefore I looked in the patchable kernels in the list and installed a patchable one that is also supported by intel (4.[4,8,10,13,15,16]] from here)

Could it be a namespace problem when the controller spawn? Could it be a kernel issue? Do you know for a fact of other kernels that work with libfranka and are in the intel list?

aPonza commented 6 years ago

I can confirm it was the kernel: today I built 4.14.12 (the one you use as default in the installation instructions and I tried both the communication_test and the move_to_start.launch:

[ INFO] [1536744422.884976038]: Loading robot model 'panda'... [ INFO] [1536744422.885035751]: No root/virtual joint specified in SRDF. Assuming fixed joint [ INFO] [1536744422.950890060]: Loading robot model 'panda'... [ INFO] [1536744422.950962544]: No root/virtual joint specified in SRDF. Assuming fixed joint [ INFO] [1536744424.065693131]: Ready to take commands for planning group panda_arm. [ INFO] [1536744424.092800395]: Combined planning and execution request received for MoveGroup action. Forwarding to planning and execution pipeline. [ INFO] [1536744424.093080716]: Planning attempt 1 of at most 1 [ INFO] [1536744424.094276926]: Planner configuration 'panda_arm' will use planner 'geometric::RRTConnect'. Additional configuration parameters will be set when the planner is constructed. [ INFO] [1536744424.094979827]: RRTConnect: Starting planning with 1 states already in datastructure [ INFO] [1536744424.106195303]: RRTConnect: Created 4 states (2 start + 2 goal) [ INFO] [1536744424.106227555]: Solution found in 0.011566 seconds [ INFO] [1536744424.108601791]: SimpleSetup: Path simplification took 0.002272 seconds and changed from 3 to 2 states [ WARN] [1536744424.126490107]: Dropping first 1 trajectory point(s) out of 10, as they occur before the current time. First valid point will be reached in 0.345s. [ INFO] [1536744425.759154881]: Completed trajectory execution with status SUCCEEDED ... ================================================================================REQUIRED process [move_to_start-8] has died! process has finished cleanly log file: /home/ap/.ros/log/3be1fc3c-b66d-11e8-b60a-d45ddf1a2746/move_to_start-8*.log Initiating shutdown!

[move_to_start-8] killing on exit [move_group-7] killing on exit [controller_spawner-6] killing on exit [joint_state_desired_publisher-5] killing on exit [joint_state_publisher-4] killing on exit [robot_state_publisher-3] killing on exit [state_controller_spawner-2] killing on exit [franka_control-1] killing on exit [INFO] [1536744426.971122]: Shutting down spawner. Stopping and unloading controllers... [INFO] [1536744426.971438]: Stopping all controllers... [INFO] [1536744426.972181]: Shutting down spawner. Stopping and unloading controllers... [INFO] [1536744426.972547]: Stopping all controllers... [WARN] [1536744427.133722]: Controller Spawner error while taking down controllers: unable to connect to service: [Errno 104] Connection reset by peer [WARN] [1536744427.134245]: Controller Spawner error while taking down controllers: unable to connect to service: [Errno 104] Connection reset by peer shutting down processing monitor... ... shutting down processing monitor complete done

Seems to me that I can safely ignore them, but feel free to tell if this is not the case.

Thank you and support for your help in trying to figure out the issue was due to a necessary hardware upgrade.

I didn't know kernel choice could be an issue, and at this point I reiterate my interest in knowing your experience with linux kernels and if you know of older compatible kernels or if you previously had discovered which options can configure it for optimal use beside the "> 5. Fully Preemptible Kernel (RT) (PREEMPT_RT_FULL) (NEW)".

It could be beneficial to understand/test which kernels actually work since, apart from the default one from your instructions, users are presented with a rather wide choice of preemptible kernels on the website, and nothing is currently stopping them from choosing another one and not having a working build afterwards. Your "We recommend choosing the version closest to the one you currently use." might actually be misleading.

I'd edit installation.rst:

First, install the necessary dependencies:

apt-get install build-essential bc curl ca-certificates fakeroot gnupg2 libssl-dev lsb-release

to

apt-get install build-essential bc curl ca-certificates fakeroot gnupg2 libssl-dev lsb-release bison flex

since bison and flex were always needed when building the kernel in my experience.

I'd add a disclaimer about the choice of the kernel or list the kernels actually tested to be compatible, right under this dependencies section.

Moreover franka_example_controllers (or franka_control) should have an exec_depend for pyassimp, ros-kinetic-ros-controllers, ros-kinetic-ros-control and maybe ros-kinetic-moveit-commander so that both with apt-get install franka_ros and via rosdep after installing franka_ros from source, the package works out of the box. The thing with pyassimp is, theoretically ros-kinetic-panda-moveit-config already depends (both in apt and in package.xml) on ros-kinetic-moveit-commander so pyassimp should already be there...except it wasn't.

gavanderhoorn commented 6 years ago

@aPonza wrote:

Moreover franka_example_controllers (or franka_control) should have an exec_depend for pyassimp, ros-kinetic-ros-controllers, ros-kinetic-ros-control

Actually, ros-kinetic-ros-controllers and ros-kinetic-ros-control are metapackages, and those should not be used for dependency declaration (as you'll depend only transitivily on the pkgs that you actually need, and if the metapackage ever changes, your dependency declaration will be broken).

aPonza commented 6 years ago

To update the thread with the (pretty unscientific) testing I have done:

I think I will use 4.8, as it's a significant enough improvement over 4.16 with respect to the consistency of high averages and mins.

Apart from the minor edits I suggested in the very end of my last comment, this can be closed.