using target network to calculate last state value

ShangtongZhang / DeepRL

Modularized Implementation of Deep RL Algorithms in PyTorch

MIT License

3.17k stars 678 forks source link

Closed backpropper closed 4 years ago

backpropper commented 4 years ago

ShangtongZhang commented 4 years ago

No that's an ad-hoc decision.

backpropper commented 4 years ago

I see. Did it help reduce variance or something else?

ShangtongZhang commented 4 years ago

Not sure. I didn't test the one without target network, just followed DQN

backpropper commented 4 years ago

ok thanks!