Implementing PlayerZero Self-play - Githubissues

Klazkin / player-zero

1 stars 0 forks source link

Implementing PlayerZero Self-play #81

Closed Klazkin closed 3 months ago

Klazkin commented 8 months ago

Explain the context of the issue, what is being addressed in detail.

242d95827262e643b8606cfb6960e49f (From "Mastering the Game of Go without Human Knowledge")

The goal

Network architecture modifications:

[x] Add batch norm to network architecture.
[x] Remove bias weights from CONV except on last.
[x] Add Dropout.
[x] Flatten last layers.
[x] Implement action combinations:
- Combination nodes are added to tree automatically (without policies) and are explored based on the value of the actions they provide?
- Re-index nodes to start from 0, all child nodes are in the 0-11 range. (CHOSEN THIS APPROACH)

Self-Play implementations

[x] Implement a hybrid MCTS that uses the PlayerZero instead of game rollouts. ~~Actions are chosen probabilistically?~~
[x] Random node generation in MCTS.
[ ] Serialize results:
- State at every root state and down to END_TURN.
- The final winner.
- MCTS policy at root state and down to END_TURN.
- (Can be inferred) Model policy at root state.
- (Can be inferred) Model winner (value) at every root state.

NN Training

[ ] verify that optimizer uses the right learning steps at every generation

Notes on further optimizations

Rotate the board

Time tracking

Time Estimate: 7 hours 0 minutes Time spent: 7 hours 0 minutes

Resources

A Simple Alpha(Go) Zero Tutorial https://suragnair.github.io/posts/alphazero.html https://github.com/suragnair/alpha-zero-general/blob/master/MCTS.py