Final adjustments ot Player Zero

Explain the context of the issue, what is being addressed in detail.

The goal

[x] adjust normalization method for policy https://github.com/suragnair/alpha-zero-general/blob/master/MCTS.py#L28
[x] Dropout layer on policy
[x] Training without Dense layer
[x] Todo, apply normalization only once, check normalization effects on memory usage
[x] Faster data loading
[x] Ensure that the threshold for serialization is correct ~~- [ ] After expanding the node, the node with the maximum policy is selected instead of child 0.~~
[x] higher number of games played.
[ ] reduce number of games that are short in the training data.
[x] lower LR on further gens

Time Estimate: 0 hours 0 minutes Time spent: 4 hours 30 minutes

...