Open Hanrui-Wang opened 5 years ago
I think this is weird, too.
agent.memory.append(
observation,
agent.select_action(observation),
0., False
)
Also, done is set to False is this tuple, which is more perplexing.
Having said so, I think this would probably have a negligible effect in terms of learning, given that the replay buffer is so big, but I think it's good for the author to check on this @ghliu .
In Buffer's code,I guess the terminal state can be used to divide transitions from each episodes,in terms this,I think maybe it‘s a bug.
Hi Guan-Horng,
Thanks for your great implementation! I am wondering why do we append additional (s a r) pair to the replay buffer after one episode is done? The reward in that pair is zero, I think it is probably not mentioned in the original paper.
https://github.com/ghliu/pytorch-ddpg/blob/e9db328ca70ef9daf7ab3d4b44975076ceddf088/main.py#L64
Thank you!