Use `training` instead of `is_training` for the `TDAgent`

At the moment, we use the boolean attribute self.is_training to control whether or not to update TDAgent after each move.

However, since we inherit from nn.Module, we already have an attribute called self.training which has the exact same interpretation as our self.is_training, and which is controlled with the method self.train() and self.eval(). I suggest that we instead use self.training to avoid confusion and to future-proof - if we want to use dropout or batchnorm down the line, then we will have to care about this flag, since they act differently in training and evaluation. self.train() and self.eval() ensures that the flag is correctly set for all submodules in the network.

The change would require us to use self.train() to configure the mode.

See the first answer here.

RasmusBrostroem / ConnectFourRL

Use `training` instead of `is_training` for the `TDAgent` #84