dgriff777 / rl_a3c_pytorch

A3C LSTM Atari with Pytorch plus A3G design
Apache License 2.0
562 stars 119 forks source link

Reward is always 0 when training Breakout-v0 #10

Closed NeymarL closed 7 years ago

NeymarL commented 7 years ago

I have trained the model a night on Breakout-v0, however the reward is always 0. What reasons may cause this situation? Or could you tell me what the parameters you are using when training to play Breakout-v0? Thank you. Here is the log file. log.txt

dgriff777 commented 7 years ago

It looks like you are getting an error on all 4 training processes and the test training processes stays running but all 4 training threads terminated so its not learning anything. Some cuda error. You should not being using Cuda with how set up.

NeymarL commented 7 years ago

Yes, that's the problem! Thank you! The cuda error has gone after I removed cuda support. But when I train it again with 3 workers, the reward still is zero all the time (maybe due to the training period is too short?). Could you give me some hints? log.txt

dgriff777 commented 7 years ago

thats very short amount of training. Especially with 3 workers. Never tried with that few of workers. But I would estimate maybe 1hr30 till score to start going up on Breakout as it takes like 30mins for 16workers till the score starts really going up. With Breakout the game does not automatically restart after each life and needs to learn to press fire button to get game started up again.

NeymarL commented 7 years ago

Thanks. Then I am going to wait a few hours and see.....

NeymarL commented 7 years ago

Cool, it works! The reward starts to increase after 3 hours training! Thanks for your code and help!

dgriff777 commented 7 years ago

Yeah gonna take a while with only 3 workers. I actually would recommend using another algorithm if only gonna train with 3 workers as it is actually dentrimental to overall performance for a3c to have such few workers as well as making it very slow to train

NeymarL commented 7 years ago

What algorithms suit for training efficiently with 3 workers?

dgriff777 commented 7 years ago

I would first just recommend using more workers but if thats not doable then it really depends on what you are looking to accomplish/learn to decide on best algorithm for your criteria