Closed Baylus closed 1 month ago
Alright, so we are on episode 90, and here is what our stats look like: It's seriously disappointing. I really cannot figure out why this is not working. I may have to choose a different solution altogether.
Just for reference, at this point our epsilon training should be at 0.57. Meaning we are making a random move 57% of the time.
I have seen some instances where there are claims that when we see a state that we have not been at before, the model should take a random action to try and learn from it, but our state size is so massive I seriously doubt there would be many instances where this will actually happen. And even if it does, our replay buffer is essentially the size of 3 games, meaning that if we continue to place each turn in, regardless of value of the replay, then we will likely not have a ton of benefit from prioritizing random actions in this way, and would rather continue to utilize RNG to determine exploration.
So, longer turn limits did not assist in our training at all. Looks about the same performance, accounted for the fact that this is one of our longer test runs.
I am looking at how to fix this. Currently suspecting that improving which DQN replays to select via a prioritized experience buffer.
In accordance with this issue: https://github.com/Baylus/2048I/issues/17 I am suspecting that there might be a negative relationship between the imposed turn limit and the model's learning rate, or general capability. Because of this, I am going to attempt to train without any turn limit and see if that has any positive relationship with learning/performance.
I will be adjusting the interval that we automatically save our checkpoints because the episodes will likely be much longer than before. I also have to double check that the automatic game state culling is functional for the DQN training, to ensure that I don't fill my entire D drive.