DeNA / HandyRL

HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
MIT License
282 stars 42 forks source link

num_parallel affecting learning results #229

Open spicytomatoes opened 2 years ago

spicytomatoes commented 2 years ago

hi, I've tried training on a 32 core machine, naturally i set num_parallel to 32. However the model does not seem to learn at all. Weirdly, when i set num_parallel to 6, the model learns. The rest of the config is exactly the same as the PubHRL config for hungry geese.

YuriCat commented 2 years ago

Thanks for your report! We ran several experiments with 64 workers, and all the training was successful. However, it is not easy to learn non-legal moves in this task, and I am sure that training is not stable.

If there is one thing I can say, it is that the PubHRL experiment setup was decided on the first try, so I cannot recommend it with confidence. As I mentioned in the discussion, I think forward_steps=1 is generally better in this kind of task. Also, a larger entropy regularization coefficient would be better.