google-deepmind / mctx

Monte Carlo tree search in JAX
Apache License 2.0
2.31k stars 188 forks source link

Question about muzero network in go game #85

Closed Nightbringers closed 8 months ago

Nightbringers commented 8 months ago

In paper, a bigger 32 layer network in the large-scale 19x19 Go experiments. The networks used 256 hidden planes, 128 bottleneck planes and a broadcasting block in every 8th layer.

Muzero have representation, dynamics, and prediction. Representation, dynamics, and prediction all use this 32 layer network? Should representation, dynamics, and prediction use same size network or different size network?

Paper saids Muzero learn fast than alphazero in go. But if they has same size network, then Muzero is 2 times slower than alphazero. Because every move Muzero need two network, and alphazero only need one. (if representation, dynamics, prediction, alphazero has same size network) . So alphazero should learn fast than muzero?

If alphazero use a bigger network, every inference time equal to Muzero's two network inference time, in ths case, can muzero outperform alphazero?

Muzero training go use K = 5 steps, I found that the larger the value of K, the more GPU memory it consumes. Is there a big difference between training with k=3 and training with k=5?

fidlej commented 8 months ago

The prediction network usually can be smaller. It serves as the output layer.

I do not have answers to these empirical questions. Maybe look at some existing MuZero implementations.