RasmusBrostroem / ConnectFourRL

0 stars 0 forks source link

Use `training` instead of `is_training` for the `TDAgent` #84

Closed jbirkesteen closed 1 year ago

jbirkesteen commented 1 year ago

At the moment, we use the boolean attribute self.is_training to control whether or not to update TDAgent after each move.

However, since we inherit from nn.Module, we already have an attribute called self.training which has the exact same interpretation as our self.is_training, and which is controlled with the method self.train() and self.eval(). I suggest that we instead use self.training to avoid confusion and to future-proof - if we want to use dropout or batchnorm down the line, then we will have to care about this flag, since they act differently in training and evaluation. self.train() and self.eval() ensures that the flag is correctly set for all submodules in the network.

The change would require us to use self.train() to configure the mode.

See the first answer here.

jbirkesteen commented 1 year ago

One thing we should think about, however: If other methods begin relying on Player objects having self.train() and self.eval() methods for configuring their self.train mode, we need to write these methods for Player() as well. However, we'd want some classes to inherit these methods from nn.Module and not from Player(). Currently, DirectPolicyAgent() classes inherit from nn.Module before Player, so it will not be a problem for these. But we need to keep this in our minds moving forward.