-
Training the model for 10h (RTX6000) on Connect4.
Is it ok that only the policy loss goes down over time, while others go up? If I understand correctly, lowering the learning rate might help? What…
-
I was doing training on a custom game, with ``replay_buffer`` of 1000 and a ``ratio`` of 1.5. In my training session, about 1100 games were played, so some of the games had to be removed from the repl…
-
Mr. Duvaud,
I have been working on implementations of mobile games -- mainly Clash Royale and Crossy Road -- using this MuZero repository in order to test its potential in modern, easy to learn, ha…
-
Hi,
I cloned your repo and tried to use it with a fresh environment. Unfortunately, it seems that it does not work properly, as I get the following error when trying to train or play against muzer…
-
When I load the pretrained model [here](https://github.com/werner-duvaud/muzero-general/blob/master/results/lunarlander/experiment1/model.weights), [the error](https://github.com/werner-duvaud/muzero-…
-
有没有复现BicNet算法的计划,我自己在尝试写BicNet算法的时候发现paddlepaddle的双向lstm接口有问题。
已经有人提出了这个问题
[https://github.com/PaddlePaddle/Paddle/issues/22979#issue-579681421](url)
zienn updated
4 years ago
-
Hi after running for a few days, the training suddenly failed.
```
2020-08-05 07:56:18,288 ERROR worker.py:1049 -- listen_error_messages_raylet: Connection closed by server.
E0805 07:56:18.288650…
-
把作者的代码读了一遍,觉得有个地方有问题。 按照我的理解作者这里把每局的replay简单的所有局面赋予了相同的z值,我按一种分支走法走到底,如果这局白棋赢了,对于这一局的所有states都赋予白棋赢的标签。
然而任何一篇alphago论文都不是这么干的,包括alpha lee的文章, 一开始就是有把单次搜索(可能是几千几万盘end_game)做一个统计,才能得出一个当前局面的value或者a…
-
I often get this error after training for a few hours. It has happened in all the games I've tried (but I've only tried two-player games). The error message below is from tictactoe. If this only happe…
-
In
https://github.com/werner-duvaud/muzero-general/blob/98cb784a06a8c25fe4a99a3c71d5358b357c5ef0/self_play.py#L384
the value to be backpropagated along the search path incorporates rewards from…