Consider scaling the max turns as time goes on

I have started to see a trend as time goes on that we are seeing more timeouts that correspond with a significant drop in scores. I am not sure if this is because the epsilon training falls off and the model is expected to stand on its feet, but still hasn't learned well enough to do so.

Will have to do some longer runs to be able to accurately assess if there is something wrong with the training somehow.