Open allentran opened 9 years ago
if N actions and u is len(K) dpi/du is K x N dQ/da | sarsa is N x 1
if N actions and u is len(K) dpi/du is K x N dQ/da | sarsa is N x 1