BY571 / Munchausen-RL

PyTorch implementation of the Munchausen Reinforcement Learning Algorithms M-DQN and M-IQN
41 stars 1 forks source link

Wrong value in call to F.softmax #2

Open marioyc opened 4 years ago

marioyc commented 4 years ago

Should F.softmax(Q_targets_next, dim=1) be F.softmax(Q_targets_next / entropy_tau, dim=1) instead?

BY571 commented 4 years ago

for DQN its only Q_targets_next:

image

but for IQN you are right :)

marioyc commented 4 years ago

Oh, I didn't notice that, seems to contradict equation 2, and it would also change the logsumexp calculations, given that these assume the q values are divided by entropy_tau

marioyc commented 4 years ago

Confirmed with the author that it is a typo, the values should be divided by entropy_tau. Also there is a TF implementation here: https://github.com/google-research/google-research/tree/master/munchausen_rl

BY571 commented 4 years ago

@marioyc Thank you! I'll fix it :)