Deep Successor Reinforcement Learning

TMats commented 7 years ago

Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman
Submitted on 8 Jun 2016

TMats commented 6 years ago

RL algorithms fall into two main classes: (1) model-free algorithms that learn cached value functions directly from sample trajectories, and (2) model-based algorithms that estimate transition and reward functions, from which values can be computed using tree-search or dynamic programming. However, there is a third class, based on the successor representation (SR), that factors the value function into a predictive representation and a reward function.

TMats commented 6 years ago

左~右上は状態のencoder-decoderでencoderの精度向上，真ん中はrewardに関する特徴が抽出されるようにする効果
右下で，action-conditionalな表現で，Rの予測で利用したrewardの予測の重みwとかけることで，割引報酬和の予測をしてる
gammaの効果を右下に持ってきている
- 状態はphiでエンコードされて，右下でaction-conditionalな予測 -これは次のphiの予測ではなく，割引報酬和が出るように重みがついている

TMats / survey

Deep Successor Reinforcement Learning #72