Lightblues / lightblues.github.io

https://lightblues.github.io/
1 stars 0 forks source link

AI-MDP | Easonshi's Space #31

Open Lightblues opened 3 years ago

Lightblues commented 3 years ago

https://lightblues.github.io/posts/f541f0d7/

一个马尔可夫决策过程可由以下五元素定义: State (s\in S) Actions (a\in A) Transition func (T(s,a,s')=P(s'|s,a)) Reward func (R(s,a,s')) Decay factor 其中,转移函数和奖励函数被称为 model,另外奖励也可能简化为 (R(s,a)) 或 (R(