MatthewJA / Inverse-Reinforcement-Learning

Implementations of selected inverse reinforcement learning algorithms.
MIT License
957 stars 238 forks source link

MaxEnt Efficient State Frequency Calculation #7

Closed chenyucheng2016 closed 5 years ago

chenyucheng2016 commented 5 years ago

According to Ziebart's paper, the equation that updates the state visit frequency is as follows: image So, I think the implementation should be: expected_svf[i, t] += (expected_svf[k, t-1] * policy[i, j] * # Stochastic policy transition_probability[i, j, k])

unhelkar commented 5 years ago

Hi @chenyucheng2016 , Nice catch! However, I think the computation of expected_svf is correct. Instead there is a typo in the equation. For more details, please see the note at this link.

chenyucheng2016 commented 5 years ago

Hi @unhelkar , thank you very much for sharing the corrected version of the paper!