Closed Ending2015a closed 2 years ago
Hi, thanks for the interest in this work. We reliably reach ~4700 as reported in the paper on Ant-v2 with 10 experts over three different seeds. See the chart below:
Interactive W&B Link: Wandb Chart
These are the parameters used:
agent=sac agent.actor_lr=3e-05 agent.critic_lr=0.0003 agent.init_temperature=0.001 agent.learnable_temperature=False env=ant env.learn_steps=1e6 env.demos=10 method.loss=value method.regularize=True num_actor_updates=1 num_seed_steps=0 train.batch=256 train.soft_update=True train.use_target=True seed=2
Let me know if there are any issues with reproducing the other MuJoCo envs. Happy to help!
These are the parameters used:
agent=sac agent.actor_lr=3e-05 agent.critic_lr=0.0003 agent.init_temperature=0.001 agent.learnable_temperature=False env=ant env.learn_steps=1e6 env.demos=10 method.loss=value method.regularize=True num_actor_updates=1 num_seed_steps=0 train.batch=256 train.soft_update=True train.use_target=True seed=2
I reproduced the results with these parameters. thank you!
These are the parameters used:
agent=sac agent.actor_lr=3e-05 agent.critic_lr=0.0003 agent.init_temperature=0.001 agent.learnable_temperature=False env=ant env.learn_steps=1e6 env.demos=10 method.loss=value method.regularize=True num_actor_updates=1 num_seed_steps=0 train.batch=256 train.soft_update=True train.use_target=True seed=2
Did you use these hyperparameters in the other mujoco environments as well? If not, it would be great if you could share them as well. Thanks!
Hello! Could you provide the hyperparameters and the number of training steps of each MuJoCo env to reproduce the Table5 results?(Appendix D.2 in the original paper) I've tried the
iq_learn/scripts/run_mujoco.sh
script to train onAnt-v2
for \~300k steps with default hyperparameters and 10 expert trajectories. But only got the eval returns around 3000~4000. Theeval/episode_reward
shows 3301.59521 and thebest_returns
is 4275.31665. thank you!