allentran / rl-l2t

Reinforcement learning paper + code: L2T
Apache License 2.0
0 stars 1 forks source link

Generate Policy gradients, wrt u, given sarsas, and Deep Q wrt and combine #5

Open allentran opened 9 years ago

allentran commented 9 years ago

if N actions and u is len(K) dpi/du is K x N dQ/da | sarsa is N x 1