Closed luizapozzobon closed 2 years ago
What training curves are you getting? It's easy to make mistakes with tf.distribute
. For debugging, I recommend to run a few seeds on a single GPU with the original code from the repository here.
Yeah, I'm pretty sure it is some problem with my tf.distribute
implementation. I ran the original code and it got to higher scores than I was getting much sooner. Thanks for the insight!!
Hello, danijar! First of all, thanks for your work :)
I've been trying out dreamerv2 this past week and tried to reproduce riverraid's results. However, I was unsuccessful and the agent only reaches about ~5k reward after almost 1e6 train steps. This is the latest result I got. If you need, I can attach tensorboard graphs later this week.
I did a small modification to the original code so it runs on multiple GPUs (
tf.distribute.MirroredStrategy
). Then, I trained the agent to play Pong and the return plot was similar to the one you posted on #8, so I figured out it was ok. Also, in the riverraid's output attached above, half of it ran with precision=16 and half with precision=32 since it was mentioned in a few other issues that precision 32 helped, especially #30. I did not did a full run with precision=32, though.Do you have any tips on what could be going wrong or what could I do to debug it?
Thanks so much!