dgriff777 / rl_a3c_pytorch

A3C LSTM Atari with Pytorch plus A3G design
Apache License 2.0
563 stars 119 forks source link

Seaquest-v0 not training as well as announced #11

Closed ThomasLecat closed 7 years ago

ThomasLecat commented 7 years ago

I have tried twice to train the agent on Seaquest-v0 with 32 workers on a server, but after 13 hours of training, the score seems to be stuck at 2700/2800 maximum.

Here's the log file : log.txt

I'am using gym 0.8.1 and Atari-py 0.0.21 and let all the hyperparameters to their default value. Any idea why the score obtained is much lower than the one you obtained ? (>50000) Would you have the trained model for Seaquest-v0 ? Thanks !

ThomasLecat commented 7 years ago

I have just watched the video created by gym-eval.py : what happens is that the agent kills all the enemies perfectly, but has never learned to surface ! Here's a gif: 00000

Any idea why it stays at the bottom like that ?

dgriff777 commented 7 years ago

Yup takes a long time for it to learn to surface correctly. Remember it took like nearly 24hrs then score started to shoot up. That game took nearly 3days if I remember correctly. Just let it keep training and it will sort itself out. As you can see it can shoot well so once the surfacing skill learned score should rise quickly after that

ThomasLecat commented 7 years ago

Got it, thanks ! Back to training then. Do you think increasing the coefficient in front of the entropy term (currently 0.01) in the loss would make the agent discover the surface quicker by encouraging exploration ?

dgriff777 commented 7 years ago

I think its good to leave the entropy term as is as its robust and effective for all games. I think massive exploration is one of the most important advantages of ai to humans so should not be neglected.

Added trained seaquest-v0 model to trained_models folder.

ThomasLecat commented 7 years ago

Ok for the entropy, and thanks for the trained model !