Q-Learning Agent - Githubissues

Serious concerns with implementation of Q-learning agents. How are we supposed to store all of the q-values for all of the state in 2048? For example, there are over 15 possible values for each square, and there are 16 squares, giving us 15^16 possible states to have Q-values for.

I propose the following options: 1) Perform offline training - here we would randomize states for a certain number of iterations and update our q-values in this way. The advantages here would be that we could seed "high" states (those already with large values in squares) so that our Q-values could be updated to include late-game configurations. The negatives are that it could take a long time to flesh out even a reasonable Q-value table in this manner (non-zero probability that we just repeat the same states over and over again) 2) Learn on the go - here we could just randomly project possible future states, update Q-values for those future states, and then return to our initial agent's decision on which move to make. The advantage here is that we most likely will update Q-values that will be seen by the algorithm. The downside is that we are clearly biasing ourselves to the current state of a board, and our run-time during the algorithm's performance will drastically slow down.

Thoughts?

JanGeffert / 2048-AI

Q-Learning Agent #33