PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.83k stars 1.28k forks source link

Ch. 15 Trpo algorithm's implementation: get_kl method #17

Closed lanseyege closed 5 years ago

lanseyege commented 5 years ago

Hi: In Chp 15, trpo's KL distance is implemented by following code: kl = logstd_v - logstd0_v + (std0_v 2 + ((mu0_v - mu_v) 2) / (2.0 * std_v 2)) - 0.5 While, based on this link https://stats.stackexchange.com/questions/7440/kl-divergence-between-two-univariate-gaussians should the kl distance of two Normal distributions be this: kl = logstd_v - logstd0_v + (std0_v 2 + (mu0_v - mu_v) * 2) / (2.0 std_v ** 2) - 0.5

Shmuma commented 5 years ago

Yep, looks like a bug! Thanks!

Shmuma commented 5 years ago

Fixed, thanks!