-
https://github.com/werner-duvaud/muzero-general/blob/4d541626a2d1ace2e3bdf30d25d9a843e4cb613c/replay_buffer.py#L79
Shouldn't this be:
`position_probs = numpy.array(game_history.priorities[:-1]) / …
-
https://github.com/werner-duvaud/muzero-general/blob/ecca75c8d5893048b0acc6c5897a504c6334b871/models.py#L137
Since you are predicting a distributions, the first reward should be scalar_to_support(0…
-
In Appendix G of the muzero paper, they define the priority of a sample as p_i = | nu_i - z_i |, and write "nu is the search value and z the observed n-step return." (I'll use "nu" in place of ν for …
-
Hi,
I would like to ask if tensorflow will be supported also as per the initial README (https://github.com/werner-duvaud/muzero-general/commit/d8388353cd37242efcdb7ff36680fc7059ecff6c#diff-04c6e90f…
-
At this line (https://github.com/werner-duvaud/muzero-general/blob/master/models.py#L125), it should be
`next_encoded_state - min_next_encoded_state`
-
please support windows,thanks.
-
Thank you very much for a comprehensive implementation.
I ran Breakout with the current configuration, except changing the actors from 350 to 4 since I ran
into memory problems with Ray. I am usin…
-
In https://github.com/werner-duvaud/muzero-general/blob/283e3538485be0e36ef77f402249666f735f5278/self_play.py#L262 you essentially assume actions are taken by players in alternating order for two-play…
-
I think the way you transform value/reward is a little mismatch with the original paper at this line (https://github.com/werner-duvaud/muzero-general/blob/fe791e8651645ea05f5b582157b4892588ee56ca/trai…