ShangtongZhang / reinforcement-learning-an-introduction

Python Implementation of Reinforcement Learning: An Introduction
MIT License
13.54k stars 4.82k forks source link

ten_armed_testbed.py中的figure2_3为何不用“sample_averages” #149

Open A-Pai opened 3 years ago

A-Pai commented 3 years ago

按照书上的介绍,用固定的步长是因为非平稳,当时代码中摇臂设置是平稳的,为何不用“sample_averages”来估计各个摇臂的value?二者方式差异较大,上边的图是固定步长,可见“sample_averages”法收敛的更快,但为何二者收敛的还不同 figure2_3 figure2_3_1