Closed gunshi closed 2 years ago
Thanks for documenting this! @langosco any thoughts? Possibly just a difference in seed?
Yeah thanks for documenting! @gunshi could you post the command (including cli args) that you're using to run the training script?
One thought that comes to mind is that you might not be training for long enough (we trained ~100 million timesteps). But hard to say without knowing details.
Hey! Thanks for responding :) I launched the training run with the command: ./experiments/scripts/train-maze-I.sh (didn't change anything). It seems like the script is set up to run for 80M steps, but in the paper's hyperparam table it says 200M. So I suppose I could try to train for longer in this case since I did see a very slowly growing reward curve eventually. Will let you know how it goes.
Hey! Thanks so much for the code release, I wanted to play with the envs and just train an expert agent on any one of them. I tried to run
./experiments/scripts/train_maze-l.sh
as it is but it seems like it's never able to consistently reach reward 10.0 (as shown in fig. 4 in the paper https://arxiv.org/abs/2105.14111) even when I hardcode therand_region
to size 14. I imagine there's some hyper-parameter that needs tuning that hasn't been updated in the config file, and being new to procgen I wanted to request if you could verify that that is the case? or possibly provide the updated script you used to get that result, it would be really helpful since I'm just exploring right now and don't have an intuition of where to start tuning! (I could also send you my wandb logs if you think the current parameters should work out of the box) Best, Gunshiedit: sorry just realised I've posted the issue to the procgenAISC repo instead of the training specific repo, but that one doesn't seem to have the Issues feature enabled, so I'm just leaving it here for now. Wandb links here: