Closed HeegerGao closed 4 years ago
It's all slightly confusing but I think they're using a different version of the environment when they use demonstrations. If you compare to the HER results of Plappert et al. (https://arxiv.org/pdf/1802.09464.pdf) then the results here are consistent. I have a suspicion the environment they use in what you linked to above is FetchPickAndPlace where all of the goals are above the table (in FetchPickAndPlace-v1 half of the goals are on the table surface, which I expect makes it easier to start learning).
It's all slightly confusing but I think they're using a different version of the environment when they use demonstrations. If you compare to the HER results of Plappert et al. (https://arxiv.org/pdf/1802.09464.pdf) then the results here are consistent. I have a suspicion the environment they use in what you linked to above is FetchPickAndPlace where all of the goals are above the table (in FetchPickAndPlace-v1 half of the goals are on the table surface, which I expect makes it easier to start learning).
Yes, you are right. Thanks for your reminding. The paper also mentioned it at the end page 5.
Hi, I find that it is pretty fast to train the agent in the Fetch_pick_and_place-v1 environment without using demonstrations. As we all know, this is a two-stage task, and the reward is quite sparse, which makes it the most difficult task in Fetch_Robot envs. But the time of training the agent in Fetch_pick_and_place-v1 using your code is not much longer than training in other environments(Reach, Slide and Push).
I find you are mainly using code from OpenAI Baselines, but their results show that we need very long time to train the agent in Fetch_pick_and_place-v1 environment if we don't use demonstartions. How do you explain it? Thank you!