Open TMats opened 7 years ago
RL algorithms fall into two main classes: (1) model-free algorithms that learn cached value functions directly from sample trajectories, and (2) model-based algorithms that estimate transition and reward functions, from which values can be computed using tree-search or dynamic programming. However, there is a third class, based on the successor representation (SR), that factors the value function into a predictive representation and a reward function.
左~右上は状態のencoder-decoderでencoderの精度向上,真ん中はrewardに関する特徴が抽出されるようにする効果
右下で,action-conditionalな表現で,Rの予測で利用したrewardの予測の重みwとかけることで,割引報酬和の予測をしてる
gammaの効果を右下に持ってきている
https://arxiv.org/abs/1606.02396