Closed hzyjerry closed 3 years ago
Yes! There may be a parameter that is affecting results.
Try adding --num-rollouts 36
when you train. For example,
nohup python3 -m ppo.train_coop --env-name ItchScratchJacoHuman-v0 --num-env-steps 10000000 --num-rollouts 36 --save-dir ./trained_models_new/ > nohup_ItchScratchJacoHuman-v0_10m.out &
Instead of updating a PPO policy after each parallel rollout (batch size of 12 with 12 parallel cores), this tells PPO to wait and update the policy after it has collected a batch of 36 rollouts.
Cool! Using 36 rollouts indeed helps a lot for the ItchScratch environment. Closing this issue.
Hi,
I'm working on reproducing cooperative ItchScratch results from the paper. I tried
ItchScratchJacoHuman-v0
with original hyperparaters and trained for 10M steps on my local 12 core machine. The training process took ~15 hours, yet the trained model isn't quite as good as the pretrained model/model in the paper (reward mean 443.2)I'm wondering if there's any hyperparameter settings/key steps that I missed? Thanks for your insight!