Open muupan opened 6 years ago
http://arxiv.org/abs/1706.05374
It is equivalent to implement a new Explorer that adds a Gaussian noise whose covariance is ρ_0 exp(cH(s)), where H(s) is a Hessian of Q(s,a) wrt a and ρ_0 and c are hyperparameters.
I will be interested to see an implementation of this agent!
http://arxiv.org/abs/1706.05374
It is equivalent to implement a new Explorer that adds a Gaussian noise whose covariance is ρ_0 exp(cH(s)), where H(s) is a Hessian of Q(s,a) wrt a and ρ_0 and c are hyperparameters.