scripts to reproduce training results

gunshi commented 2 years ago

Hey! Thanks so much for the code release, I wanted to play with the envs and just train an expert agent on any one of them. I tried to run ./experiments/scripts/train_maze-l.sh as it is but it seems like it's never able to consistently reach reward 10.0 (as shown in fig. 4 in the paper https://arxiv.org/abs/2105.14111) even when I hardcode the rand_region to size 14. I imagine there's some hyper-parameter that needs tuning that hasn't been updated in the config file, and being new to procgen I wanted to request if you could verify that that is the case? or possibly provide the updated script you used to get that result, it would be really helpful since I'm just exploring right now and don't have an intuition of where to start tuning! (I could also send you my wandb logs if you think the current parameters should work out of the box) Best, Gunshi

edit: sorry just realised I've posted the issue to the procgenAISC repo instead of the training specific repo, but that one doesn't seem to have the Issues feature enabled, so I'm just leaving it here for now. Wandb links here:

maze: https://wandb.ai/gunshi/objective-robustness/reports/val_mean_episode_rewards-22-08-01-22-08-80---VmlldzoyNDA5MzUx?accessToken=lz0sfeilovr5yhgwpfa52m8hqywrda36liso35p7221wbouh5rsw6w2k46tj08qr
keys: https://wandb.ai/gunshi/objective-robustness/reports/val_mean_episode_rewards-22-08-02-12-08-76---VmlldzoyNDEyOTIw?accessToken=i8h3vkm2u1wnrq6pxu3np33srls0gmt28wy33uddyrj212kz7cfb0lg87emxn688 (any idea why the rewards starts high at 10.0 and then suddenly degrades and starts to rise very very slowly?)

JacobPfau commented 2 years ago

Thanks for documenting this! @langosco any thoughts? Possibly just a difference in seed?

langosco commented 2 years ago

Yeah thanks for documenting! @gunshi could you post the command (including cli args) that you're using to run the training script?

One thought that comes to mind is that you might not be training for long enough (we trained ~100 million timesteps). But hard to say without knowing details.

gunshi commented 2 years ago

Hey! Thanks for responding :) I launched the training run with the command: ./experiments/scripts/train-maze-I.sh (didn't change anything). It seems like the script is set up to run for 80M steps, but in the paper's hyperparam table it says 200M. So I suppose I could try to train for longer in this case since I did see a very slowly growing reward curve eventually. Will let you know how it goes.

JacobPfau / procgenAISC

scripts to reproduce training results #2