Analyzing issues with PZ (4/4)

Explain the context of the issue, what is being addressed in detail.

The goal

[x] Higher epoch and datasamples training.
[x] GPS model.
[x] adjustments to GPS block.
[ ] Investigate what causes the policy training to not go above 50% accuracy (is it a model architecture limitation?)
[x] Verify MCTS score calucalted correctly.
[x] Minor MCTS algorithm adjustments.
[x] Increased model base specs
[x] #97
- During self-play, instead of chosing max action, they are instead chosen using a temperature metric -> higher temp means choices are more random, lower temp -> more often picks high score actions
- [x] Analyze the impact of using a weighted scoring function instead of mean in pure MCTS
- [x] ignore training samples with only 1 action choice
- [x] check handling of empty samples too

Time Estimate: 0 hours 0 minutes Time spent: 4 hours 20 minutes

...