Dueling dqn equation - Githubissues

dxyang / DQN_pytorch

Vanilla DQN, Double DQN, and Dueling DQN implemented in PyTorch

428 stars 94 forks source link

Open HencyChen opened 6 years ago

HencyChen commented 6 years ago

Thanks for offering this wonderful code. But I have a question.

Why in the combination part of the equation, the advantage A need to subtract it's average? I've already refer to the paper but still don't understand.

HareshKarnan commented 3 years ago

^ because of the fact that there can be multiple V(s) and A(s,a) that satisfy the Advantage equation. For example,

Q(s,a) = V(s) + A(s,a) = (V(s)+c) + (A(s,a)-c)

So, to learn that unique V and A, you subtract mean of Advantage for actions so the advantage for the optimal action is 0.