Closed Hotwaterman closed 3 years ago
I have a similar problem, were you able to get any success with HIRO running on AntPush?
I succeeded, but the variance between experiments is large, meaning that only 3/4 out of ten random seed experiments may be successful. I think it is a problem of reward setting. The current distance reward will mislead the agent to explore, and the fault tolerance rate of ANTPUSH is very low. HAC is because high-level action hindsight affects exploration.
I succeeded, but the variance between experiments is large, meaning that only 3/4 out of ten random seed experiments may be successful. I think it is a problem of reward setting. The current distance reward will mislead the agent to explore, and the fault tolerance rate of ANTPUSH is very low. HAC is because high-level action hindsight affects exploration.
Thank you for your quick response. Did you use the same parameters mentioned in the experiments README?
I used h-baselines to reproduce HIRO and HAC. But there are two problems: