Closed csherstan closed 5 years ago
Hi, thanks for raising this. You will see I mask the hidden state based on episode termination see https://github.com/edbeeching/3d_control_deep_rl/blob/master/3dcdrl/models.py#L66. So this is working as intended.
It doesn't look like you're resetting the GRU state when the done variable is True. Is that on purpose? I would think that might cause problems for the agent in the tasks which require memory.