datawhalechina / easy-rl

强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/
Other
9.04k stars 1.81k forks source link

value_iteration 算法不收敛 ? #138

Open chensisi0730 opened 1 year ago

chensisi0730 commented 1 year ago

value_iteration 测试的成功率是: 0.638 ,价值算法需要不断 的迭代,做策略评估, 代码里面只做了一次迭代

sherlcok314159 commented 1 year ago

All of these algorithms converge to an optimal policy for discounted finite MDPs. FYI,引自强化学习导论,你可以尝试添加discount