Closed spacegoing closed 4 years ago
Also, why would here use sampled option rather than max
:
https://github.com/ShangtongZhang/DeepRL/blob/717fe68e7ed00a80c6c52ec9613c9a16dbb37e0c/deep_rl/agent/OptionCritic_agent.py#L97
Hi Shangtong,
I would love to know how the value function is calculated here. Why isn't it just
V=maxQ
rather than the expectation overp(w|s, epsilon)
?Also, should it not be
1 - epsilon + epsilon / q_option.size(1)
according to
In L101 I interpret the q value as the Q\Omega in Eq1 in the OC paper (In fact this is wrong as this q value is trained via intra-option q-learning instead of SARSA) corresponding to the epsilon-greedy policy. So v is computed by weighted sum of Q\Omega according to this epsilon-greedy policy. L34 is the weight for the q value corresponding to the greedy action. I think it's equivalent to L101.
Also, why would here use sampled option rather than
max
: https://github.com/ShangtongZhang/DeepRL/blob/717fe68e7ed00a80c6c52ec9613c9a16dbb37e0c/deep_rl/agent/OptionCritic_agent.py#L97
This is the advantage of an action at an augmented state (state \times option), option is part of this augmented state (see my DAC paper for details about this augmented MDP).
Thank you for your reply. After reading your DAC paper I became a fan of you:D Brilliant work mate! I am only recently attracted by reinforcement learning and found it such a fascinating area. If it's possible could you please share your learning path (textbooks etc.)? I found myself lacking of many backgrounds such as augmented MDP and kind of confused where to start. Many thanks!
Rich's book -> Martin Puterman's book about MDP -> Neuro Dynamic Programming from Bertsekas
Great! Many thanks!
Hi Shangtong,
https://github.com/ShangtongZhang/DeepRL/blob/717fe68e7ed00a80c6c52ec9613c9a16dbb37e0c/deep_rl/agent/OptionCritic_agent.py#L101
I would love to know how the value function is calculated here. Why isn't it just
V=maxQ
rather than the expectation overp(w|s, epsilon)
?Also, should it not be
1 - epsilon + epsilon / q_option.size(1)
according to https://github.com/ShangtongZhang/DeepRL/blob/717fe68e7ed00a80c6c52ec9613c9a16dbb37e0c/deep_rl/agent/OptionCritic_agent.py#L34