Open aleSuglia opened 3 years ago
this atari_net has a "training only" behavior in act periods
if self.training: action = torch.multinomial(F.softmax(policy_logits, dim=1), num_samples=1)
so if you use model.eval
during training
this action will skip above line so that always choose most greedy one, make this traning exploreing totally fail
as you say eval will be make dropout and normalize layer different but this atari_net archtectur is too simple doesn't have such layer so .eval will be function as same as .train
Hey guys,
Thanks again for this amazing library that makes training RL agents extremely easy. I have a quick question about the
act()
function. This is supposed to be the function that is responsible for collecting the experiences of the agent in the environment. In this phase, the actor model is used which is different from the learner model. In PyTorch, as you might know, there are two different modalities: 'train' and 'eval'. I was expecting that theact()
would call themodel.eval()
before starting collecting new experiences but it is not happening here: https://github.com/facebookresearch/torchbeast/blob/master/torchbeast/monobeast.py#L128I have seen people arguing that in an RL setup is important to disable dropout to reduce the variance of the policy. This would be a side-effect of calling
eval()
. I can see that the default agent doesn't have any dropout so maybe this wasn't required in your case. What would you recommend?