Handling of terminal state

I don't really understand what you did to the calcul of the nonterminals state in the last commit (line 128 in memory.py).

nonterminals = self.dtype_float([transition.nonterminal for transition in full_transitions[self.history + self.n - 1]]).unsqueeze(1)

What if the current state is just one step before a terminal state? The full_transitions[self.history] will be terminal but not all the next one (cause you only postappend one frame as terminal in memory) and particulary the full_transitions[self.history + self.n - 1] will not be terminal...

In fact I think that the safest way to handle terminal_state is to postappend self.n frames (and set them as terminal) and not only one (exactly in the same way that in preappend where you add self.history frames and not only one).

Indeed if you got self.n>>self.history, the current computation of the returns will be wrong cause you don't check if we reach a terminal state and we could go in the next episode reward (when self.n<self.history this bug is hidden by the fact that you preappend self.history frames with 0 reward at the beginning of each episode)

Kaixhin / Rainbow

Handling of terminal state #11