value_iteration 算法不收敛？

datawhalechina / easy-rl

强化学习中文教程（蘑菇书🍄），在线阅读地址：https://datawhalechina.github.io/easy-rl/

Other

9.04k stars 1.81k forks source link

Open chensisi0730 opened 1 year ago

chensisi0730 commented 1 year ago

value_iteration 测试的成功率是： 0.638 ，价值算法需要不断的迭代，做策略评估，代码里面只做了一次迭代

sherlcok314159 commented 1 year ago

All of these algorithms converge to an optimal policy for discounted ﬁnite MDPs. FYI，引自强化学习导论，你可以尝试添加discount