PierreExeter / rl_reach

RL Reach is a platform for running reproducible reinforcement learning experiments.
https://rl-reach.readthedocs.io/en/latest/index.html
42 stars 9 forks source link

forcing joint position and setting joint position #6

Open stefanwanckel opened 3 years ago

stefanwanckel commented 3 years ago

I have a question regarding the simulation in pybullet. in the widowx_env, pybullet never does stepSimulation(). Instead, the positions of the robot are forced into the new postition. Why did you choose this approach? In this manner no real physics simulation happens through pybullet...

Thanks in advance

PierreExeter commented 3 years ago

Yes that's correct, the joints are forced to a new position rather than being applied a position or torque. This is a design choice made by Replab from which the rl_reach environments are based on. It may be a good idea to ask them directly.

I think this approach was adopted for speeding up the simulation time, since it's computationally faster to reset joint positions than applying torques and computing the real physics and inertial forces. Of course, it means that the simulation is less realistic but I found that it wasn't a big issue if the motion amplitude is small enough between each time steps. For example, when I transferred the trained model from Pybullet to the real robot, the robot was able to reproduce the policy learnt exclusively with Pybullet. One downside was that we have to wait for the physical robot to complete its motion at each time step, resulting in a jerking motion (not smooth).

I'm planning to look into using the setJointMotorControl2 function to control the joints by torque in the future.

PierreExeter commented 3 years ago

For information, I have just pushed an update to the WidowX environments where the joints are position-controlled (as opposed to being forced to the new position), see here. You can activate this by creating a new environment and setting action_type: 2 in init.py

I thought it would be useful to compute the real physics simulation, for example to detect collisions. Thanks for your suggestion.

The only downside is that the training takes much longer to complete.

stefanwanckel commented 3 years ago

I also did that myself. glad that I could be of some help :)

Yes, the training takes very long when doing position control. Unfortunately Pybullet doesnt support multithreading atm...

PierreExeter commented 3 years ago

Did you manage to train a model successfully with position-controlled joints? If so, please could you share how you implemented it? It seems that I cannot get the model to learn when I change from forced joints to position-controlled joints.

stefanwanckel commented 3 years ago

In general, I was not able to train any models for the past 2 months. I have tried several, but I almost always get similar results: Either the arm "folds" itself (see the picture in issue 6) or it creeps over the floor. I have noticed that the action at every step while in contact with the floor or in this "folded" configuration still is something like [1 -1 1 -1..] meaning that the robot wants to continue its trajectory but since it is either at max_joint or in collision with the floor (a simple plane) it cant continue. I havent figured out why it attempts to move towards the floor considering that the goal is in a completely different location...

stefanwanckel commented 3 years ago

When moving with position control the joint values that one gets by calling getjointState can slightly vary from the values that we put into the position control function. This error can accumulate. One idea would be to feed the desired position rather than the one we get from position control. I have also observed that e.g. calcInverseKinematics is extremely inaccurate (even though I dont see them in your code). This could be an indication that even the calcForwardKinematics = True might also lead to inaccuracies. It a very vague idea though

stefanwanckel commented 3 years ago

One more thought: I believe that your threshold for sparse rewards is too small. Especially if you train wihtout HER the model might not see enought reward signals. I chose it to be 0.1 or 0.2 . But I havent been successful either. I also outsourced it to be defined in the init which seemed a bit easiert to handle.

stefanwanckel commented 3 years ago

Another thought: when training her, the reward for sampled goals is calculated using the "compute_reward" function defined in the environment. Because I did not specify the axis along which to compute the norm, I got an array which had the same value in every entry. I changed it s.t. the norm is computed along axis 1 (norm(x,y,z coordinates of gripper).

I am wondering if it makes more sense to just use the specified reward by reward_type...

PierreExeter commented 3 years ago

When moving with position control the joint values that one gets by calling getjointState can slightly vary from the values that we put into the position control function. This error can accumulate. One idea would be to feed the desired position rather than the one we get from position control. I have also observed that e.g. calcInverseKinematics is extremely inaccurate (even though I dont see them in your code). This could be an indication that even the calcForwardKinematics = True might also lead to inaccuracies. It a very vague idea though

In order for getjointState to return the correct joint position, you should allow the robot to complete its action by skipping a few frames. For example see here.

PierreExeter commented 3 years ago

One more thought: I believe that your threshold for sparse rewards is too small. Especially if you train wihtout HER the model might not see enought reward signals. I chose it to be 0.1 or 0.2 . But I havent been successful either. I also outsourced it to be defined in the init which seemed a bit easiert to handle.

This wasn't an issue for me, I was able to train a model with a threshold of 0.001 (I didn't use a completely sparse reward thought). Feel free to modify it as you wish.

PierreExeter commented 3 years ago

Another thought: when training her, the reward for sampled goals is calculated using the "compute_reward" function defined in the environment. Because I did not specify the axis along which to compute the norm, I got an array which had the same value in every entry. I changed it s.t. the norm is computed along axis 1 (norm(x,y,z coordinates of gripper).

I am wondering if it makes more sense to just use the specified reward by reward_type...

I'm not an expert on HER but the idea is the use the achieved_goal in order to leverage unsuccessful episodes during training when you're using sparse reward. The compute_reward function must use the achieved_goal array in some way but you can change the function. Currently this function returns the distance between the achieved_goal and the goal, which you want to minimise.