Reproducing Cooperative ItchScratch results

hzyjerry commented 3 years ago

Hi,

I'm working on reproducing cooperative ItchScratch results from the paper. I tried ItchScratchJacoHuman-v0 with original hyperparaters and trained for 10M steps on my local 12 core machine. The training process took ~15 hours, yet the trained model isn't quite as good as the pretrained model/model in the paper (reward mean 443.2)

Reward Mean: -62.023032418203236 (from 100 rollouts)
Reward Std: 37.62848439453216
Task Success Mean: 0.0
Task Success Std: 0.0

I'm wondering if there's any hyperparameter settings/key steps that I missed? Thanks for your insight!

Zackory commented 3 years ago

Yes! There may be a parameter that is affecting results. Try adding --num-rollouts 36 when you train. For example,

nohup python3 -m ppo.train_coop --env-name ItchScratchJacoHuman-v0 --num-env-steps 10000000 --num-rollouts 36 --save-dir ./trained_models_new/ > nohup_ItchScratchJacoHuman-v0_10m.out &

Instead of updating a PPO policy after each parallel rollout (batch size of 12 with 12 parallel cores), this tells PPO to wait and update the policy after it has collected a batch of 36 rollouts.

hzyjerry commented 3 years ago

Cool! Using 36 rollouts indeed helps a lot for the ItchScratch environment. Closing this issue.

Healthcare-Robotics / assistive-gym

Reproducing Cooperative ItchScratch results #14