Open anubisthejackle opened 9 years ago
like you see in the chart i contributed there is only a small positive trend.
StateManager.scores.length: 636 (number of games)
experience replay size: 100002
exploration epsilon: 0.01
age: 100004
average Q-learning loss: 0.39981905616281765
smooth-ish reward: 0.5372073053150149
I've noticed the same results. I'm largely convinced that the backing network architecture needs to be changed. I've been working on cleaning up the code to make that an easier process.
i started another run with 3075 games, same result.
experience replay size: 524667
exploration epsilon: 0.01
age: 524669
average Q-learning loss: 0.3571297194077231
smooth-ish reward: 0.5395184739721711
I'm thinking that switching to a Long Short-Term Memory network will improve the training speed. The convolution network seems to plateau really quickly. I've let this run overnight, and all day long, for multiple days, and I've never gotten 2048.
It's likely that a convolution network isn't sufficient for this type of problem. I'm beginning to think that a better solution would be a Long Short-Term Network. This means using a different neural network platform than ConvNetJS. Synaptic is a quality system that's architecture free, and just so happens to have examples of how to setup a LST Net on their README.md