facebookresearch / torchbeast

A PyTorch Platform for Distributed RL
Apache License 2.0
734 stars 113 forks source link

"Done" default of 1 results in 0 reward episodes #23

Open SamNPowers opened 3 years ago

SamNPowers commented 3 years ago

In core/environment.py, done defaults to torch.ones, instead of torch.zeros. This means that in monobeast's act(), the first replay entry each actor creates has a done value of 1. Then when episode returns are reported, those episodes have rewards of 0, though the episodes never really happened at all.

(By the way, excellent repo! Very useful.)

heiner commented 3 years ago

Hey Sam,

Thanks for your interest in TorchBeast and for your kind words.

You are correct. The reason done is True at t=0 is because done == True iff "episode just started" which is the the case for the first episode, too.

If this is a problem in your case I'm happy to accept a patch that turns the torch.ones into torch.zeros as I don't believe this matters currently (I suppose the LSTM/agent state needs to be reset in the same way it is initialized for it to not matter at all, but all of this affects the first episode only).