cambel / ur3

ROS-based UR3/UR3e controller with simulation in Gazebo. Adaptable to other UR robots
MIT License
137 stars 42 forks source link

UR5 F/T Sensor #1

Closed murtazabasu closed 4 years ago

murtazabasu commented 4 years ago

Hello, thank you for your suggestion about the step joint commands, I will try implementing that and see the results. Moreover, the FT Sensor you were mentioning before did you install it to the UR5 end effector or to the Robotiq grippers and to which link specifically?

cambel commented 4 years ago

The FT sensor usually goes from the last link to the gripper (or another tool). So, in the simulation, I mounted an FT sensor between the link wrist_3_link and the tool0. The real UR5 does not have an FT sensor, though you can add one such as the Robotiq FT300.

murtazabasu commented 4 years ago

Okay I have attached the ft sensor but I am not sure how should I use for collision detection because the FT sensor keeps publishing the force and torque values. The values change even when the arm moves (not colliding to anything).

cambel commented 4 years ago

Sure, you will have readings of the FT sensor, since while moving you have gravity and acceleration from the motion. But you can set a threshold of how much force would mean collision for you.

murtazabasu commented 4 years ago

Thank you for the clarification! Also, do you check continuously the force-torque values from theFT sensor while the action is been taken and then reset when the FT values are above the set threshold i.e. (checking the FT values and publishing the action within the same while loop) or do you take the action first, the action gets simulated and then you observe the collision values and reset if necessary?

cambel commented 4 years ago

Actually, so far, I haven't done any kind of collision detection for my experiments. I have just set episodes of a 100 or 200 steps and actions that are within some constrained area. But in your case, depending on how much the robot is allowed to move with each action you could try checking either the FT value while the action is taking place (for long actions) or after (for short actions). Otherwise, if your only concern is a collision with the table, you can more easily check for the position of the end effector before taking an action and see if the z value is going to be too low or not, if it is, you can just override the action to stay in place.

murtazabasu commented 4 years ago

Yeah I check for z of the end effector before taking action but if the agent chooses a bad action then initially the z will not be low but after the publishing the action, the agent goes and collides and disrupts the simulation. I tried your suggestion before but it didn't seem to work. What I am trying to do is that the agent publishes the action but at the same time it also checks for the force values, and while publishing the action if it finds that the force is going high than the threshold then stop publishing the action and reset the robot. I can't find a way to do that.

murtazabasu commented 4 years ago

Helo Cambel, how many episodes did you train your agent for and how much total time did it take? Also, about the step actions you mentioned before did you mean action = current_joint_values + (step)*(action chosen by agent) ? And what was the "step" chosen for your training? Moreover, in my training I have given two conditions for the robot two reset it surpasses the threshold limit 1.) The value from the FT sensor and the other that if the z of the eef is below the height of the table,

       if coll_force >= 100 or coll_force <= -100 or coll_z <= 0.23:
            self.reset_robot2.call()
            self.reward_coll2 = -0.6
            self.collided += 1
            sys.stdout.write("\r \t \t Collided {}".format(self.collided))
            sys.stdout.flush()
            #print("#############Collision2##############")
            rospy.sleep(0.5)

Based on this condition I give a negative reward of -0.6, is this reward too high and do these conditions constraint the training too much? Please provide me your suggestions on this.

Also to mention currently my training parameters are, max episodes = 1500 max steps = 200 update steps = 100

cambel commented 4 years ago

Hi @Murtaza786-oss, I have been training small scenarios so I usually center around a task being almost done. So for those cases, I was able to train and learn something useful within about 100 episodes each with 100 or 150 steps, making updates every 10 or so episodes. But, I think that may differ greatly from the kind of experiments that you are doing. About the overall time, it takes about 30 minutes to learn.

Yeah about the step action, it is kind of like that, but I thought it as: action = current_joint_values + (action chosen by agent)/action_scale But well that is just the same as saying that what you called step=1/action_scale

In any case, since you may or may not be bounding the value of the action taken by the agent, I use that step/action_scale as a hyperparameter to control how much I allow the robot to move at each step. Empirically I have noticed that if it is too big or too little the agent may not learn anything or break the learning (returning NaN).

About the reward, if you are only given that reward on the collision condition while using a vanilla RL algorithm, then I think maybe it will take a long time to train. You may want to constrain your learning to some set of tasks, but defining a reward function. For example, try to give a reward for partial goals or for distance to a certain goal.

An additional suggestion, maybe you don't really want to start learning from scratch because it takes to long to learn with a sparse reward you may try imitation learning, learning by demonstration, or something that I am starting to check that is called residual policy learning, which basically is combining some traditional motion planning (e.g. moveit) or any manually engineered trajectory with RL to accelerate learning.

murtazabasu commented 4 years ago

Thank you Cambel for all the suggestions. However, can you tell me the range of action_scale value you used for training?

cambel commented 4 years ago

Sure, I am working by moving the end effector using cartesian coordinates, so I usually set the action in the range of 10-100 depending on if I just one to move a few centimeters or more. If you are moving each joint by setting a joint angle, I think a higher value would be better but usually maybe less than a 1000 would work.

murtazabasu commented 4 years ago

So if I am not wrong, you meant action = current joint values + ((joint angles from agent)/(x<1000)) ?

cambel commented 4 years ago

yeah exactly

murtazabasu commented 4 years ago

Did you also clip the action in the range (-pi, pi) ? so that the final action becomes action = np.clip(current joint values + ((joint angles from agent)/(x<1000)), -np.pi, np.pi)

Otherwise, the action array keeps on summing up values and it can exceed above the range (-pi, pi) which might cause abnormal behaviour in the robot while simulating

cambel commented 4 years ago

Yes, I also clip the action to the limits of the joint angles or workspace reach when I work in catesian space.

murtazabasu commented 4 years ago

After clipping the actions the training is very weird, the robot takes the same kind of action repeatedly. I don't observe any learning. I have shared the video here. https://www.dropbox.com/s/8g1ygrz4qbbdkwx/2019-12-02-09-16-44.mp4?dl=0

Also, the way I am performing action is,

        if collision_check is True:
            self.reset_robot2.call()
            self.reward_coll2 = -2
            self.collided += 1.0
            sys.stdout.write("\r \t \t Collided {}".format(self.collided))
            sys.stdout.flush()
            #print("#############Collision2##############")
            #rospy.sleep(1)
        else:
            #velocity = 0.3
            #print(joint_values)
            self.reward_coll2 = 0
            action = JointTrajectoryPoint()
            step_action = np.clip(curr_ja_r1 + 0.05*(joint_values), -np.pi, np.pi)
            #actionFloat = [float(i) for i in step_action]
            action.positions = step_action
            #action.velocities = [velocity]*joint_values.size
            action.time_from_start = rospy.Duration(0.35)
            self.traj.points = [action]
            #self.traj.points.append(action)
            self.action_pub.publish(self.traj)    

        rospy.sleep(0.6)
        #rospy.Time().now().nsecs
        # Take Observation after action
        obs = self.take_observation()
cambel commented 4 years ago

The video is taken after how many steps of interaction? It may take many steps to learn something depending on many things either related to the learning algorithm parameters or the environment setup. I guess you could make the robot move faster, maybe at 20 or 40hz instead of the 0.35 segs. Other than that, in my opinion it seems to move just fine. You can also speed the simulation by changing the parameter in gazebo physics>real time update rate change it from 1000 to 10000 to make it go as fast as possible.

murtazabasu commented 4 years ago

Hi, the video was taken after approx 90 episodes. Also, it it really possible to move the robot at 20-40 hz, because whenever I try to reduce the frequency say 10 hz the robot goes crazy.

The real-time update you mentioned in Gazebo, do I need to change the value every time in Gazebo? or is there a particular file for this (say a urdf/xacro) where I can modify this parameter permanently.

murtazabasu commented 4 years ago

Also please have a look at this, I was running the training today and it was approx 10 episodes, I did a ros echo in the topic where the joint angles is being continuously published from the simulation, the environment subscribes to this topic for calculating the states. so when I did a rostopic echo /joint_angles_r2 I got

positions: [-59.69028030681207, -2.123415148631654, -0.07799022427493973, 3.141588667154828, -0.8611677963949056, -0.8141655121227638]
velocities: []
accelerations: []
effort: []
time_from_start: 
  secs: 0
  nsecs:  

After 10 episodes, you can see the message in that the position value of the first joint angle which is -59.69. There are also instances where some of the value in the joint angle go beyond 750 or less than that. I don't understand why any of the values in joint angle should go beyond (-pi, pi)? because the action published by the agent is always within the range (-pi, pi). There is no problem action values which are being published. Moreover, when i do a rostopic echo in the topic where the action are being published and also in the topic where the moveit publishes the joint values from the simulation (i.e. joint_angles_r2), they both have the same joint values. I am attaching a photo for better understanding, image

if you see, the left side is the topic which is published by the action (the robot goes and moves in the simulation) and the right side is the topic (joint_angles_r2) that measures the joint angles from the simulation for calculating the states etc. Now initially at the start of the training both of them has the same joint values (which should remain like that). But after some episodes say 10, The right topic i.e. the joint_angles_r2 has different values! Some of the joint values in this topic seem to somehow sum up and has values greater than pi or less than -pi which I also mentioned above.

cambel commented 4 years ago

Yes, the parameter can be fixed in the *.world file, using the tag

Sure I was able to move the robot at 40hz to do this simple 2D reach task

Okay, I am not sure what you are doing in your code because you say that you are moving the robot 2 (r2?), right? but in the action, you are querying the joints angle from the r1?

step_action = np.clip(curr_ja_r1 + 0.05*(joint_values), -np.pi, np.pi)

Also, where is the action and the joint_angles coming from? When you have for example a 750 joint angle, how is the state of the robot in the simulation?

murtazabasu commented 4 years ago

No, that's just naming of the variable, curr_ja_r1 is the joint angle of the robot2. The curr_ja_r1, I am getting from another python file which uses move it to get the current joint values from the simulation. This file starts when launching the launch file and once this file is active it starts publishing the joint angle value from the simulation into a rostopic. The file looks somewhat like this,

pub = rospy.Publisher('/joint_angles_r2', JointTrajectoryPoint, queue_size=3)
#rospy.Subscriber("action_move_robot2", JointTrajectoryPoint, move_action_r2)
r = rospy.Rate(100)
joint_angles = JointTrajectoryPoint()
while not rospy.is_shutdown():
    # #*************** Uncomment this block if the node should send random points to the action function
    # #Gives a random number between 1 and 6, to move the robot randomly with the 6 actions.
    # action = random.randint(1,6)

    # #Calls move function, which sends the wanted movement to the robot
    joint_values = group.get_current_joint_values()
    joint_angles.positions = joint_values
    #print(joint_angles)
    pub.publish(joint_angles) 
    r.sleep()

When the joint angle is 750, that particular joint angle spins, then collides and then resets and then it continues doing that repeatedly...

Initially, the action comes from the agent, gets published and the robot moves. Meanwhile, the moveit measures the new joint angle from the simulation and publishes it. The env subscribes this joint value from moveit and calculates the new states etc..

Therefore, the action chosen by the agent (the joint angles) = the joint angle measured by moveit from simulation. And it is from the figure I sent you before. But after certain episodes, the joint angle measured by moveit changes (which I wrote you before) which causes some joints to spin, collide and reset and then it continues..

You can see the difference here, image

cambel commented 4 years ago

First, why are you republishing the joint states? you can probably already directly subscribe to the controller's node that provides that information, usually is some node called joint_states.

As for why the action is getting bigger... I am not sure just by watching snippets of your code. I was checking my controller class I guess you may want to be sure that the JointTrajectory() is initialized each time you send a new action as to not accumulate points. if that is not the problem then I don't really know. I would suggest you try my implementation that way if you have a problem it would be easier to help

murtazabasu commented 4 years ago

Ok got you, so I removed the topic and now I am directly subscribing from controller topic to get the joint states i.e. robot2/joint_states then training lasted for 211 episodes and after that, the robot started spinning and the robot joint angles measured where,
image

Do you have any idea about what possibly could be going wrong? why the joint angles sum up to such large values?

cambel commented 4 years ago

umm I don't know, it seems to be a problem with the controller of the robot because you mentioned that the actions sent to the robot are not that high, right?. But what exactly could be causing it, I have no clue.

cambel commented 4 years ago

Yeah, I have both options available but I mainly use position_controllers/JointTrajectoryController

murtazabasu commented 4 years ago

Hi, so as you suggested, I have removed moveit completely from the system. I use joint_statesto get the joint angles and I use forward kinematics to get the end-effector position and quaternion values. I observe that the speed is exceptionally better. but I still face the problem which I mentioned earlier. After a couple of episodes and resets the robot starts spinning wobbling and the measured joint angles by the robot2/joint_statesthrow values above (-pi, pi) which is different than the agent joint values. I presume, that the problem is with a joint trajectory controller, Maybe my trajectory controllers are no efficient enough to work continuously at high frequency. Can you please suggest me an efficient trajectory controller for my case?
PS: I also tried your using your controller file ur_control/config/ur3_controllers.yaml but it didn't work. Also, which ros version did you use for your project?

cambel commented 4 years ago

After a couple of episodes and resets the robot starts spinning wobbling and the measured joint angles by the robot2/joint_statesthrow values above (-pi, pi) which is different than the agent joint values.

That is really strange. By the way, are you using the urdf model that includes joint limits? that may somehow help.

Maybe my trajectory controllers are no efficient enough to work continuously at high frequency

I don't think that would be a problem. I am using ros kinetic. Maybe you can try using the package itself and not just the controller file.

cambel commented 4 years ago

Hi @Murtaza786-oss, great! it working much better.

For the local minimum, you may try some sort of learning by demonstration or residual policy learning. Starting from scratch is quite difficult for the robot and maybe unnecessary, especially if you already want some specific configuration as the goal. However, if you really wish to keep learning from scratch I would suggest adding a term to your cost function that penalizes actions, with some weight to each joint action where the first joint would have a greater weight. Also, you may try learning from different initial positions so that the robot explores more.

murtazabasu commented 4 years ago

Did you use the normal Advantage Function or the Generalized Advantage Function (GAE) which comes with an additional advantage function (lamda)?

cambel commented 4 years ago

Not really, for now I am using SAC the straight implementation from tf2rl repo

cambel commented 4 years ago

Great, it's looking better again.

For generalization I would try to understand how is the robot learning. When you trained on one target pose it seemed to be able to learn within 500 episodes, but what if you train the agent on the other targets alone. does it always learns at about the same speed?. I would guess it may not learn equally because the dynamics of the robot change for different positions. So you may try either to train using more target pose and randomize which one to use each episode or train for longer time.

Additionally, I think you are trying to make the agent learn the kinematics of the robot by moving it's joints but having as target a pose. So you could take advantage of the inverse kinematics of your robot to make it considerably more simple problem to solve. I.e, instead of commanding the position of each joint angle, you would command the position of the gripper in task-space (x, y, z, rx, ry, rz) where rx,ry,rz are the rotations around each axis.

murtazabasu commented 4 years ago

So you suggest me to change my actions to a POSE instead of joint angles?

cambel commented 4 years ago

Yes, you can use an inverse kinematic package to actually move the robot in joint space. You can use ikfast or trac_ik convert from pose to joint angles. I have a repo for the ikfast for the ur robot, but trac_ik is also very straight forward. By using pose, the robot won't have to solve the robot's kinematics and instead would need to solve some linear paths in each pose direction.

murtazabasu commented 4 years ago

I just need to clarify this, what you are suggesting me is

Actor NN --> action (POSE) --> IKfast --> Joint Angles --> Publisher (rostopic)

is this what you mean?

cambel commented 4 years ago

Yes

murtazabasu commented 4 years ago

Hi the rx, ry, rz are the Euler Angles i.e. (R, P, Y) right? which can be calculated by a transformation from quaternions? Also why can't I directly use the quaternions instead of RX, RY and RZ in the state representation and also in actor NN?

cambel commented 4 years ago

yeah, it is the Euler Angles that can be obtained with a transformation from quaternions. For sure you can use the quaternions, though I'm not sure how. For me, it is easier to intuitively understand the actions if they are in Euler Angle form. But if you can implement it using the quaternion that is also okay, or even better some say. If you do that let me know how to do it.

murtazabasu commented 4 years ago

Hi and if I use POSE as output from the actor NN i.e. (Translation x, y, z and Euler RX, RY, RZ) what is the desired action space for this. I mean the low and high? I am not sure about it. Moreover, did you also include the information of this POSE in your state representation?

Also, did you get better results with this methodology of learning a pose instead of learning the joint angles. I tried implementing this yesterday but the prob with IK is that it gives me None value sometimes if it is not able to find a solution. Is it possible to have a step pose or something here the same which you suggested me for the joint angles i.e. curr_JA + step*JA (from agent)

The quaternions to Euler transformation can be as follows,

array = [0,0,0,1]
w = array[0]
x = array[1]
y = array[2]
z = array[3]

t0 = +2.0 * (w * x + y * z)
t1 = +1.0 - 2.0 * (x * x + y * y)
roll = math.atan2(t0, t1)
t2 = +2.0 * (w * y - z * x)
t2 = +1.0 if t2 > +1.0 else t2
t2 = -1.0 if t2 < -1.0 else t2
pitch = math.asin(t2)
t3 = +2.0 * (w * z + x * y)
t4 = +1.0 - 2.0 * (y * y + z * z)
yaw = math.atan2(t3, t4)

a = [roll, pitch, yaw]
print (a)
cambel commented 4 years ago

Hi, it would be the same idea of taking steps at each time step, but instead of taking steps in joint angles it would be in Cartesian space. in translation (x, y, z) would be like taking a step that may be between -1 to 1cm, or larger if you want. in the rotation would be taking rotation let's say -1 to 1 degree. That's how it is defined in my environment, the actions I get from the agent are between [-1,1] so I map that value to a range of motion for each direction. Then I apply curr_pose + step*poseA (from agent). Yeah, it is way better, because you don't need to solve the kinematics of the robot with your agent, so it is a lot easier to learn. Not finding a solution for the IK is the main issue for me too, I know there are ways to mitigate those problems by avoiding singularities but I am not really an expert in that. What I do to mitigate the problem is that if a solution is not found, then just stay in the same place (same current joint angles). Since the action of the agent is not 100% the same, for the next action it would probably move. Additionally, in my case, the robot is still moving when the next action is commanded, so that also guarantees that the next action will be different and maybe the IK will be solved next time. In any case, if you are penalizing the agent for not achieving the task quickly then it will learn to avoid those singularities where the IK is not found. So far it works fine for me.

murtazabasu commented 4 years ago

Hi I thought to share these two videos following from my latest work https://www.youtube.com/watch?v=bRCusJprGo8

https://www.youtube.com/watch?v=hdSmxVf3RV4

Would like to have your valuable comment :)

cambel commented 4 years ago

@Murtaza786-oss, it looks great. What are you using for controlling the robot? I found a paper that supported my view of using task-space for RL.

Why does the robot vibrate so much? maybe the actions are too big? you could also try to use an l1l2 norm of the distance to encourage more precise positioning of the end-effector.

Look this is what I have been working on, hopefully, I will be able to share it soon. https://youtu.be/RgWLyllE9Aw

murtazabasu commented 4 years ago

Hi Cambel, I hope you are healthy. I read your paper and saw your video and its quiet intuitive what are you are working on. By this time I would like to tell you that I have successfully achieved remarkable results on single agent with randomized five targets in training. In testing additionally with these 5 targets I I've additional 3 new on which the agent is never trained before. The agent is able to generalize it and reach those new targets with really good accuracy. Hopefully I will able to share the video soon. Also I am following the normal forward kinematics architecture i.e. learning the joint angles. Now, I am working on coordinating two robots and here is where I need your suggestions. Bot the robots are exactly facing each other with R1 at +0.72 and R2 at-0.72 on y-axis. Now this is a master-slave scenario where R2 is the master and R1 is the slave. What that means is that there is a common target (pose). This pose is the target for R2 but for R1 the position from that target pose remains the same but the orientation differs Now to find the right orientation for R1 from the orientation of R2 I do a quaternion transformation on R2's quat and give it to R1 and it works but the problem is transformation is valid only for few cases. Is there a better way to perform coordination between two robots in RL?

cambel commented 4 years ago

Hi @Murtaza786-oss, thank you, I hope you are also healthy. what do you mean when you say that there are just some few cases of valid orientations after performing the transformation on R2's quat? If the problem is that the R1's desire orientation can in some way not be reached by R1 then maybe you can try doing a random search of orientations close to your desire orientation for R1. Furthermore, you can do a sanity-test using IK. just check that the desire orientation for R1 has at least one valid IK solution. Also, the more IK solutions that orientation has the more likely it is that your algorithm will find a path to it successfully. Stay safe.

murtazabasu commented 4 years ago

Hi @cambel so I will explain you in detail. Have a look at this initial setup cam Now the transformation i was talking about is a simple transformation i.e. i take the squat from master covert to rpy, then perform a -90 and -90 on the roll and yaw, then again convert it back to quat and then give it to slave. So when I do that I get this result, 2020-04-20 (1)

But the problem is this transformation is valid only for certain cases i.e. if the master end-effector is 0 deg in x and y-axis and having a slight rotation in z-axis. I want to find a general solution such that the slave should come to the target (with a 90 deg rotation in the pitch or within any desired tolerance) so that it can grab a workpiece from the master. If a general solution like this exists I can make it learnable for RL. Moreover, I have a complex environment where Master and slave has their individual targets to reach and then when its time to coordinate for the common target, the slave learns and now I have to reach this target but orthogonal to Master so that I can grab the workpiece.

Stay Safe:)

cambel commented 4 years ago

I think I get what you are trying to do, but not so sure what your problem with the transformation is. I don't get why there is not a valid transformation for all cases if you just want to perform the said rotation around the 2 axes. Unless it is a problem with the Euler angles in which case you can try a direct transformation using quaternions, without converting back and forth to RPY angles. The other possible problem I can think of is that, as I mention previously, you have a problem with the slave not being able to reach the orientation orthogonal to the Master. Well, in that case as mentioned you can validate the orientation quickly using IK and a random search close to your desired orientation until getting a valid one.

I also think that your problem boils down to a grasp problem. You may want to check a grasp planner to "plan" how to grasp your target object with both arms, that should give you the pose and orientation of each arm's end-effector. Also, it could deal with more general cases than just grasping in an orthogonal position. Then using RL to generate the motion of each arm. The most widely used open-source planner is graspIt In my lab, there are some people working on this topic of dual manipulation. They basically follow these steps: plan grasping and then generate motion using RTT with IK validations. In any case, it is by no means a trivial task to solve. But that makes it a very interesting research topic.

murtazabasu commented 4 years ago

Thanks cambel! I would like to know how people in your lab are able to connect graspit and Ros together I mean both of them have a separate interface. I am not sure how should I use urdf files in grasp it and connect it to my RL algorithm. Because initially I need to use ROS/Gazebo for actually moving the joint angles of the robot and let say after the robot reaches the target close enough I need to do the grasp planning using graspit. I am not sure how to connect the pieces here. Would be great if you provide some insight in this regard

cambel commented 4 years ago

Hi @Murtaza786-oss, I do not have much experience with the grasp planner my self. As for my lab teammates they mainly use an in-house planner. But I think you can check how to interface both graspit and ros: I found this repo https://github.com/graspit-simulator/graspit_interface You may also try to generate the grasping posture offline and then yous manually define the postures for your ros environment to avoid interfacing both.