I am using the code in atari_dqn.ipynb to train a policy for Gravitar from scratch (on 1 run = 100 shards of data), and this is what my loss log looks like so far:
The agent gets to a loss of 0.000 in < 10 minutes, and the loss just seems to be oscillating very close to 0. Is such a low loss expected, or does this suggest an issue with the code I'm running?
logging the loss to tensorboard showed that the relevant significant figures have been truncated in the above output and that the loss does nicely decrease using the provided code.
I am using the code in atari_dqn.ipynb to train a policy for Gravitar from scratch (on 1 run = 100 shards of data), and this is what my loss log looks like so far:
The agent gets to a loss of 0.000 in < 10 minutes, and the loss just seems to be oscillating very close to 0. Is such a low loss expected, or does this suggest an issue with the code I'm running?