Generate Policy gradients, wrt u, given sarsas, and Deep Q wrt and combine

allentran / rl-l2t

Reinforcement learning paper + code: L2T

Apache License 2.0

0 stars 1 forks source link

Open allentran opened 9 years ago

allentran commented 9 years ago

if N actions and u is len(K) dpi/du is K x N dQ/da | sarsa is N x 1