Is this method faster to train than HER+DDPG ?

TianhongDai / esil-hindsight

This is the official code of our paper "Episodic Self-Imitation Learning with Hindsight" [Electronics 2020].

MIT License

7 stars 2 forks source link

Is this method faster to train than HER+DDPG ? #1

Closed karanchahal closed 3 years ago

karanchahal commented 3 years ago

Hello, big fan of your work ! Just had a query of whether this is better than HER+DDPG and what makes it better ? :)

TianhongDai commented 3 years ago

@karanchahal Hi - Thanks for your interests. The purpose of our work is to utilize hindsight with self-imitation learning to solve the exploration problem of on-policy algorithms (e.g. PPO) in the continuous control environments with sparse reward. Thus, our method don't have better sample efficiency than off-policy algorithms (e.g. DDPG + HER). However, our method achieves better performance in the FetchPickAndPlace task (around 98% success rate). Hope our work is helpful to you.