Stochastic Value Gradients implementation

Hi there,

I am also interested in reproducing the results for SVG(1). I have my own implementation of SVG(1), but some details about the implementation in the original paper are a bit obscure to me.

Specifically, it's not entirely clear to me how the KL regularization is performed and how the KL penalty is chosen and updated. I believe this plays a crucial role in the stability of the algorithm.

Therefore, I would love to take a look at the implementation used here and would be extremely grateful if you made it public.

Best regards, Ângelo

WilsonWangTHU / mbbl

Stochastic Value Gradients implementation #5