Q-learning算法在更新过程中只更新了Q函数

PaddlePaddle / PARL

A high-performance distributed training framework for Reinforcement Learning

https://parl.readthedocs.io/

Apache License 2.0

3.22k stars 816 forks source link

Closed Subarashi2 closed 1 year ago

Subarashi2 commented 1 year ago

我们目的是找到一个最优的q函数，依据q函数可以得到一个最优的q函数，请问在这个过程中我们使用到了折扣回报这个东西了吗

Subarashi2 commented 1 year ago

后来想了想Q函数的定义就是折扣回报的期望。

TomorrowIsAnOtherDay commented 1 year ago

是的，通过bootstrapping的更新方式。