Open spicytomatoes opened 2 years ago
Thanks for your report! We ran several experiments with 64 workers, and all the training was successful. However, it is not easy to learn non-legal moves in this task, and I am sure that training is not stable.
If there is one thing I can say, it is that the PubHRL experiment setup was decided on the first try, so I cannot recommend it with confidence.
As I mentioned in the discussion, I think forward_steps=1
is generally better in this kind of task. Also, a larger entropy regularization coefficient would be better.
hi, I've tried training on a 32 core machine, naturally i set num_parallel to 32. However the model does not seem to learn at all. Weirdly, when i set num_parallel to 6, the model learns. The rest of the config is exactly the same as the PubHRL config for hungry geese.