Closed backpropper closed 4 years ago
Is there a reference to why we use a separate target network to calculate the last state value? https://github.com/ShangtongZhang/DeepRL/blob/19db77206eb65621d0670eccc3a556bba9d5fac3/deep_rl/agent/OptionCritic_agent.py#L88
No that's an ad-hoc decision.
I see. Did it help reduce variance or something else?
Not sure. I didn't test the one without target network, just followed DQN
ok thanks!
Is there a reference to why we use a separate target network to calculate the last state value? https://github.com/ShangtongZhang/DeepRL/blob/19db77206eb65621d0670eccc3a556bba9d5fac3/deep_rl/agent/OptionCritic_agent.py#L88