TianhongDai / hindsight-experience-replay

This is the pytorch implementation of Hindsight Experience Replay (HER) - Experiment on all fetch robotic environments.
MIT License
396 stars 75 forks source link

FetchPickAndPlace-v1 #16

Closed quyouyuan closed 3 years ago

quyouyuan commented 3 years ago

Hello!TianhongDai. Thank you very much for your code to help me! But I can't run the results given in the paper in environment 'FetchPickAndPlace-v1'. Can you give me some help?

TianhongDai commented 3 years ago

@quyouyuan Hi, Could I know what problems you have, please? I'm very happy to help you solve them.

quyouyuan commented 3 years ago

thank you! I want to ask how I can reproduce the paper results! Why is the curve I get by running 'mpirun -np 16 python -u train.py --env-name='FetchPickAndPlace-v1' 2>&1 | tee pick.log' in 'redeme' higher than that in the paper?And it's very different!

thanks your reply

TianhongDai commented 3 years ago

@quyouyuan Hi, this is because, the network's input in HER paper is different from my code / code from openai baselines.

In HER paper, page 7, the first paragraph:

"Observations: In this paragraph relative means relative to the current gripper position. The policy is given as input the absolute position of the gripper, the relative position of the object and the target4, as well as the distance between the fingers. The Q-function is additionally given the linear velocity of the gripper and fingers as well as relative linear and angular velocity of the object. We decided to restrict the input to the policy in order to make deployment on the physical robot easier."

We can find that - relative linear and angular velocity of the object are only given in the Q-function, which form an asymmetric actor-critic structure. This is because they said: "We decided to restrict the input to the policy in order to make deployment on the physical robot easier.", and access / estimation of relative linear and angular velocity of the object is difficult in the real world.

However, in most of other HER extensions, or even in openai code: https://github.com/openai/baselines/blob/master/baselines/her/actor_critic.py#L31-L39 , relative linear and angular velocity of the object are provided to both policy network and Q-function.

I think you should better refer to this paper: https://arxiv.org/pdf/1802.09464.pdf, if you don't need to deploy HER on the physical robot.

quyouyuan commented 3 years ago

Thank you for your answer! I feel very helpful to me

julio-design commented 2 years ago

May I ask how did you obtain the plot? I have a custom environment, but I'm using this pytorch implementation of HER to train my model

TianhongDai commented 2 years ago

May I ask how did you obtain the plot? I have a custom environment, but I'm using this pytorch implementation of HER to train my model

@ollintzinlab Hi - you can refer to this script: https://github.com/TianhongDai/esil-hindsight/blob/main/plot_curves.py.

julio-design commented 2 years ago

Thank you 😀