num_parallel affecting learning results

DeNA / HandyRL

HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.

MIT License

282 stars 42 forks source link

Thanks for your report! We ran several experiments with 64 workers, and all the training was successful. However, it is not easy to learn non-legal moves in this task, and I am sure that training is not stable.

If there is one thing I can say, it is that the PubHRL experiment setup was decided on the first try, so I cannot recommend it with confidence. As I mentioned in the discussion, I think forward_steps=1 is generally better in this kind of task. Also, a larger entropy regularization coefficient would be better.

DeNA / HandyRL

num_parallel affecting learning results #229