Expected Policy Gradients

chainer / chainerrl

ChainerRL is a deep reinforcement learning library built on top of Chainer.

MIT License

1.18k stars 224 forks source link

Expected Policy Gradients #221

Open muupan opened 6 years ago

muupan commented 6 years ago

http://arxiv.org/abs/1706.05374

It is equivalent to implement a new Explorer that adds a Gaussian noise whose covariance is ρ_0 exp(cH(s)), where H(s) is a Hessian of Q(s,a) wrt a and ρ_0 and c are hyperparameters.

xylee95 commented 5 years ago

I will be interested to see an implementation of this agent!