Closed henrycharlesworth closed 2 years ago
Hi @henrycharlesworth,
Good catch! To add to your comment, I think it is a typo in the MuZero preprint. They actually fixed it in the Nature paper(page 8, right column).
Best
Hi, @henrycharlesworth @Hwhitetooth
Really thank you for correcting the formula!
This is a mistake, and it should be h1, rather than h2.
.
Current implementation will lead to larger error. And we will correct this formula later.
More importantly, we can find that when x >= 0, h1(x) is equal to h2(x). When x < 0, we have
Since eps=0.0001, the error of h1(x) and h2(x) is small. Therefore, it probably doesn't make much difference.
And thank you again for your detailed discussion!
Hey, firstly just wanted to say thank you because this is an amazing repo for understanding how MuZero/EfficientZero work in detail!
I've been trying to dig into exactly how the value prediction is done as it seems like a pretty significant detail that is hidden away in an appendix and I think there seems to be a slight discrepancy (that probably doesn't make much difference but is maybe still worth highlighting).
In the original paper (https://arxiv.org/pdf/1805.11593.pdf) they define the scaling function as:
with the inverse function given by proposition A.2 (iii).
but in the MuZero appendix they have:
(with the final term inside the bracket).
Unless I'm mistaken, in the code you've used the MuZero version of h(x), but for the inverse formula you've used the formula given in proposition A.2 (iii) of the first paper - which won't quite be correct anymore, right?
Just to show the discrepancy - if I look at the following code:
which is how the functions are implemented in this code base I get a value of ~2.4 printed, whilst if I change the scalar transform to be the same as in the first paper I get a value of ~0.04.