Benchmarking during training

When two agents are learning simultaneously, it might make sense to test each of them up against random/minimax at certain points in their training, to see if they have learnt something (as this can't be seen from the win-rate between the two).

Write Env.benchmark(), which should take a player as argument as well as an opponent to test against for some specified number of games and log the result to Neptune.

I can see 2 ways of implementing it (and I like number 1 best):

Re-write Env.play_game() and Env.step() to depend on the training-mode flag of the players (see #84). Only update stats, assign rewards etc. if the agent is training. With this approach, we need the Player() parent class to also have a self.training attribute.
Create a copy of the player to be benchmarked, in order to not mess with its stats during the benchmarking. Feed this copy of the player object into Env.benchmark() and use it for playing games. This way of doing it might require fewer changes, but I feel it's a bit less natural to have to generate copies of player objects as they are already part of the environment.

RasmusBrostroem / ConnectFourRL

Benchmarking during training #75