-
Hi Werner, I've really enjoyed tinkering with the codebase as I learn all aspects of MuZero. I see in the MuZero paper they describe how they mask the policy logits to allowable moves in the root nod…
-
![image](https://user-images.githubusercontent.com/78921582/124135808-294cbb00-da52-11eb-97b4-1bc20f088567.png)
### What is the problem?
Cuda + nvidia drivers and I started to see this problem
…
-
## Description
During training on GPU I experience
- 4 MiB memory leak on GPU per epoch (looks constant)
- duration increase about 1 min per epoch (looks linear)
### Expected Behavior
no memory…
-
Recently, I finished reading this repo code. And I found that the entropy bonus of a state value from SAC is only added at the last output step.
This routine let me can't help but thinking:
If t…
-
I have been using OpenSpiel for my project and I found the string representation of the states are helpful but do have some limitations. I have been building visualization tools to use with OpenSpiel …
-
### Proposal
Include the Hutter Prize corpus ([enwik9](http://mattmahoney.net/dc/enwik9.zip)) as a "game" for the purpose of sample-efficient reinforcement language modeling.
### Motivation
…
-
There seems to be a bug on Windows 10 with cuda devices. `torch.nn.DataParallel(model)` will move model parameters and buffers to the GPU even if `selfplay_device = 'cpu'`. If you move the model to cp…
-
-
I tried to train a game with resnet as my network. Was extremely slow on a computer with 5950x and 3090 rtx. (like 1 step per 2-3 seconds). I then tried to decrease the number of resblocks to 1. It he…
-
As mentioned in the MuZero paper, revisiting past time steps to re-execute the MCTS and update policy 'targets' (child_visits) can help improve sample efficiency. Is there a reason (other than computa…