two problems （HAC and AntPush)

AboudyKreidieh / h-baselines

A repository of high-performing hierarchical reinforcement learning models and algorithms.

MIT License

281 stars 43 forks source link

two problems （HAC and AntPush) #226

Closed Hotwaterman closed 3 years ago

Hotwaterman commented 3 years ago

I used h-baselines to reproduce HIRO and HAC. But there are two problems:

HAC performance is poor. This is somewhat different from the performance in the HAC paper. Is it the reason for the code or something?
When I was doing “AntPush” experiment, the command was like "python experiments/run_hrl.py "AntPush" --use_huber --evaluate --eval_interval 50000 --nb_eval_episodes 50 --total_steps 3000000 --relative_goals --off_policy_corrections" . Are these settings correct? Because I run like this, HIRO's success rate has always been 0.

hassanrasheedk commented 2 years ago

I have a similar problem, were you able to get any success with HIRO running on AntPush?

Hotwaterman commented 2 years ago

I succeeded, but the variance between experiments is large, meaning that only 3/4 out of ten random seed experiments may be successful. I think it is a problem of reward setting. The current distance reward will mislead the agent to explore, and the fault tolerance rate of ANTPUSH is very low. HAC is because high-level action hindsight affects exploration.

hassanrasheedk commented 2 years ago

I succeeded, but the variance between experiments is large, meaning that only 3/4 out of ten random seed experiments may be successful. I think it is a problem of reward setting. The current distance reward will mislead the agent to explore, and the fault tolerance rate of ANTPUSH is very low. HAC is because high-level action hindsight affects exploration.

Thank you for your quick response. Did you use the same parameters mentioned in the experiments README?