Benchmark and training attribute

Fixes #84, fixes #75, fixes #86

Should not be merged prior to #85.

For the benchmarking: Could've gone with several different solutions. In the end, I decided to go with approach 1 in the issue, even though it also had a few challenges. Some of this had to do with the fact that the only way to check who won a game is by using the stats dictionary, which depend on rewards. So we can't just stop the reward assignment and stats update. I wrote new separate attributes and methods for benchmarking and updated Env.play_game() to use these, when a player is in eval mode (as implemented for solving #84).

Everything isn't documented yet, but the new script shows how to use the benchmarking functionality.

RasmusBrostroem / ConnectFourRL

Benchmark and training attribute #87