ShangtongZhang / DeepRL

Modularized Implementation of Deep RL Algorithms in PyTorch
MIT License
3.17k stars 678 forks source link

using target network to calculate last state value #93

Closed backpropper closed 4 years ago

backpropper commented 4 years ago

Is there a reference to why we use a separate target network to calculate the last state value? https://github.com/ShangtongZhang/DeepRL/blob/19db77206eb65621d0670eccc3a556bba9d5fac3/deep_rl/agent/OptionCritic_agent.py#L88

ShangtongZhang commented 4 years ago

No that's an ad-hoc decision.

backpropper commented 4 years ago

I see. Did it help reduce variance or something else?

ShangtongZhang commented 4 years ago

Not sure. I didn't test the one without target network, just followed DQN

backpropper commented 4 years ago

ok thanks!