As written in https://github.com/ShangtongZhang/reinforcement-learning-an-introduction/issues/83 this would result in a more deterministic action choice in situations, where several actions give identical value (which due to floating point errors are not identical). At least when I added the suggested rounding the produced figure(s) resembled Fig. 4.3. more:
This removes the artefacts seen in the original figure produced by the code (Ex4.9_plotB.jpg in the repo or see below):
P.S. The number of digits might have to be increased such that it also works for the $p_h=0.55$ example in the code (i.e. Figure Ex4.9_plotD.jpg), e.g. digits=12.
https://github.com/LyWangPX/Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions/blob/68b023ba5cdd46db5fa9daf7a161e65b154ad529/Chapter%204/Ex4.9.py#L25
Suggested replacement:
As written in https://github.com/ShangtongZhang/reinforcement-learning-an-introduction/issues/83 this would result in a more deterministic action choice in situations, where several actions give identical value (which due to floating point errors are not identical). At least when I added the suggested rounding the produced figure(s) resembled Fig. 4.3. more:
This removes the artefacts seen in the original figure produced by the code (Ex4.9_plotB.jpg in the repo or see below):
P.S. The number of digits might have to be increased such that it also works for the $p_h=0.55$ example in the code (i.e. Figure Ex4.9_plotD.jpg), e.g.
digits=12
.