Closed toshikwa closed 4 years ago
Hi, thanks for the question.
It is true that reduce_mean is more 'correct' in that it is invariant to the batch size. In terms of learning dynamics, I don't think it makes much difference since this loss is optimized on its own, not in combination with another (in contrast to say, actor-critic) In this case, a scaling by a constant factor will essentially get normalized away by an optimizer like Adam.
Having said that, I think reduce_mean is probably clearer/simpler/makes for easier comparisons, so I'll change that. Thanks!
Hi, I have one simple question about DQN's loss here.
Why do you use
tf.reduce_sum
instead oftf.reduce_mean
here?? Are there some reasons for it? Have experiments in the paper been done calculating loss which sums over the batch??Sorry for asking such a simple question, but I would really appreciate it if you answer my question.
Anyway, this is a great project !! Thank you :)