Currently, there are some issues with the training, where the model is not getting any better at the game. There are a ton of timeouts, which for the score really shouldn't be the case, since a low score timeout means that the model was trying to make moves that did not affect the board, which are moves we definitely want to avoid as much as possible.
Important parts for this change will be to visualize the board states, but also the moves being made, to identify when a board state did not change, thus resulting in a longer game than necessary. The reason we even have this turn limit is because the model could get stuck trying to constantly make the same move over and over again, which would mean the training comes to a grinding halt. I could attempt to disable the turn limit and just see exactly what happens if I do let it continue on.
Key functions:
Rewatch previous games
Visualize moves
Visualize non-productive moves
Most importantly would be a tracker to count the number of sequential moves in a row that have not yielded a beneficial result
When a move has been made by the epsilon trainer, and thus breaking up any previous streaks of non-productive moves (maybe still keep another counter for the number of intentional moves made that yielded no results, so that we can still understand if the model is just continuously failing to identify what is a valid move)
Currently, there are some issues with the training, where the model is not getting any better at the game. There are a ton of timeouts, which for the score really shouldn't be the case, since a low score timeout means that the model was trying to make moves that did not affect the board, which are moves we definitely want to avoid as much as possible.
Important parts for this change will be to visualize the board states, but also the moves being made, to identify when a board state did not change, thus resulting in a longer game than necessary. The reason we even have this turn limit is because the model could get stuck trying to constantly make the same move over and over again, which would mean the training comes to a grinding halt. I could attempt to disable the turn limit and just see exactly what happens if I do let it continue on.
Key functions: