The code is well written and it’s quite straightforward to get the key points, also the use of classes and comments helps to better grasp what you’re doing.
The idea of making exploration less biased by the learning policy by using replay memory is clever.
My suggestions and advice:
You can use rotations and symmetries to reduce the state space and improve model performance.
As you mentioned, you could also dynamically update the discount factor and learning rate according to the different phases of training.
Hi Andrea, here's my review!
Appreciations:
My suggestions and advice:
Good job, and hope it helps!
Best regards, Edoardo