Implement td agent - Githubissues

Implements our first version of the agent discussed in #81.

Adds the TDAgent Player class, a self-play method to the environment as well as an initial version of a self-training script.

The state representation follows #81. In self-play, the opponent is created by freezing the network at the beginning of each game.

RasmusBrostroem / ConnectFourRL