google-deepmind / bsuite

bsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent
Apache License 2.0
1.51k stars 181 forks source link

Question about DQN's loss #23

Closed toshikwa closed 4 years ago

toshikwa commented 4 years ago

Hi, I have one simple question about DQN's loss here.

Why do you use tf.reduce_sum instead of tf.reduce_mean here?? Are there some reasons for it? Have experiments in the paper been done calculating loss which sums over the batch??

Sorry for asking such a simple question, but I would really appreciate it if you answer my question.

Anyway, this is a great project !! Thank you :)

aslanides commented 4 years ago

Hi, thanks for the question.

It is true that reduce_mean is more 'correct' in that it is invariant to the batch size. In terms of learning dynamics, I don't think it makes much difference since this loss is optimized on its own, not in combination with another (in contrast to say, actor-critic) In this case, a scaling by a constant factor will essentially get normalized away by an optimizer like Adam.

Having said that, I think reduce_mean is probably clearer/simpler/makes for easier comparisons, so I'll change that. Thanks!