Can't reproduce riverraid's results

luizapozzobon commented 2 years ago

Hello, danijar! First of all, thanks for your work :)

I've been trying out dreamerv2 this past week and tried to reproduce riverraid's results. However, I was unsuccessful and the agent only reaches about ~5k reward after almost 1e6 train steps. This is the latest result I got. If you need, I can attach tensorboard graphs later this week.

train_return 5190 / train_length 982 / train_total_steps 9.5e5 / train_total_episodes 1220 / train_loaded_steps 9.5e5 / train_loaded_episodes 1220

I did a small modification to the original code so it runs on multiple GPUs (tf.distribute.MirroredStrategy). Then, I trained the agent to play Pong and the return plot was similar to the one you posted on #8, so I figured out it was ok. Also, in the riverraid's output attached above, half of it ran with precision=16 and half with precision=32 since it was mentioned in a few other issues that precision 32 helped, especially #30. I did not did a full run with precision=32, though.

Do you have any tips on what could be going wrong or what could I do to debug it?

Thanks so much!

danijar commented 2 years ago

What training curves are you getting? It's easy to make mistakes with tf.distribute. For debugging, I recommend to run a few seeds on a single GPU with the original code from the repository here.

luizapozzobon commented 2 years ago

Yeah, I'm pretty sure it is some problem with my tf.distribute implementation. I ran the original code and it got to higher scores than I was getting much sooner. Thanks for the insight!!

danijar / dreamerv2

Can't reproduce riverraid's results #39