Question about muzero network in go game

In paper, a bigger 32 layer network in the large-scale 19x19 Go experiments. The networks used 256 hidden planes, 128 bottleneck planes and a broadcasting block in every 8th layer.

Muzero have representation, dynamics, and prediction. Representation, dynamics, and prediction all use this 32 layer network? Should representation, dynamics, and prediction use same size network or different size network?

Paper saids Muzero learn fast than alphazero in go. But if they has same size network, then Muzero is 2 times slower than alphazero. Because every move Muzero need two network, and alphazero only need one. (if representation, dynamics, prediction, alphazero has same size network) . So alphazero should learn fast than muzero?

If alphazero use a bigger network, every inference time equal to Muzero's two network inference time, in ths case, can muzero outperform alphazero?

Muzero training go use K = 5 steps, I found that the larger the value of K, the more GPU memory it consumes. Is there a big difference between training with k=3 and training with k=5?

google-deepmind / mctx

Question about muzero network in go game #85