facebookresearch / torchbeast

A PyTorch Platform for Distributed RL
Apache License 2.0
735 stars 114 forks source link

act() function doesn't use model in eval mode #20

Open aleSuglia opened 3 years ago

aleSuglia commented 3 years ago

Hey guys,

Thanks again for this amazing library that makes training RL agents extremely easy. I have a quick question about the act() function. This is supposed to be the function that is responsible for collecting the experiences of the agent in the environment. In this phase, the actor model is used which is different from the learner model. In PyTorch, as you might know, there are two different modalities: 'train' and 'eval'. I was expecting that the act() would call the model.eval() before starting collecting new experiences but it is not happening here: https://github.com/facebookresearch/torchbeast/blob/master/torchbeast/monobeast.py#L128

I have seen people arguing that in an RL setup is important to disable dropout to reduce the variance of the policy. This would be a side-effect of calling eval(). I can see that the default agent doesn't have any dropout so maybe this wasn't required in your case. What would you recommend?

mrzhuzhe commented 2 years ago
  1. this atari_net has a "training only" behavior in act periods if self.training: action = torch.multinomial(F.softmax(policy_logits, dim=1), num_samples=1) so if you use model.evalduring training this action will skip above line so that always choose most greedy one, make this traning exploreing totally fail

  2. as you say eval will be make dropout and normalize layer different but this atari_net archtectur is too simple doesn't have such layer so .eval will be function as same as .train