Effect of larger sample queue?

icompute386 commented 5 years ago

Looking at chapter 7 some more, you have two examples for breakout (breakout-small, and breakout), which have different sample queue sizes. What should we expect from a larger queue? Slower/faster convergence, a stable/smooth grow in reward value, or is it just something of a trial and error to see what works?

One more question, are these Atari games (pong, space-invaders, and breakout) all deterministic? So all this training would be useless on a game with a random seed. Such as the ball in another starting position, paddle in a different starting position, invaders moving in a different direction,?

Thanks, Chris

Shmuma commented 5 years ago

Breakout environment is there for experimentation with params and training on more complex environment rather than pong. But those params are not guaranteed to convergence (even pong is diverges in 5-10% of seeds). From my personal tests, it takes several days of training to get policy with average mean score of 100. Get 800 requires a week and tons of tweaking.

Non-deterministic nature of the game should be fine for good policy, but getting such good policy might be another story.

icompute386 commented 5 years ago

Oh wow thanks Shmuma, thanks for the info. You must have put in a lot of time to find the value you did.

PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Effect of larger sample queue? #39