Closed chenyucheng2016 closed 5 years ago
Hi @chenyucheng2016 , Nice catch! However, I think the computation of expected_svf
is correct. Instead there is a typo in the equation. For more details, please see the note at this link.
Hi @unhelkar , thank you very much for sharing the corrected version of the paper!
According to Ziebart's paper, the equation that updates the state visit frequency is as follows:
So, I think the implementation should be:
expected_svf[i, t] += (expected_svf[k, t-1] * policy[i, j] * # Stochastic policy transition_probability[i, j, k])