Closed karanchahal closed 3 years ago
@karanchahal Hi - Thanks for your interests. The purpose of our work is to utilize hindsight with self-imitation learning to solve the exploration problem of on-policy algorithms (e.g. PPO) in the continuous control environments with sparse reward. Thus, our method don't have better sample efficiency than off-policy algorithms (e.g. DDPG + HER). However, our method achieves better performance in the FetchPickAndPlace
task (around 98% success rate). Hope our work is helpful to you.
Hello, big fan of your work ! Just had a query of whether this is better than HER+DDPG and what makes it better ? :)