Open xuehy opened 7 years ago
it's a multidimensional normal distribution with a spherical covariance:
you have Nd_Sigma = 1d_sigma^2 * Nd_Identity
. so |Nd_Sigma| = 1d_sigma^(2*k)
then, k/2 * log(2 * pi * e) + 1/2 * log(|Sigma|) = k/2 * (log(2*pi*e*1d_sigma^2)) = k/2 * (log(2*pi*1d_sigma^2) + 1)
After, you can divide by k to reduce the weight of the entropy loss. the -1 is to make the quantity positive, so the gradient descent will make it close to 0.
But you are right, there is still a typo. it should be
entropy = -0.5*((sigma_sq
x 2*pi.expand_as(sigma_sq)).log()+1)
instead of
entropy = -0.5*((sigma_sq
+ 2*pi.expand_as(sigma_sq)).log()+1)
@alexis-jacq @xuehy Have you tried the modified entropy? I also found that the original entropy calculation seems wrong, and changed as @alexis-jacq one. But It seems the original one looks better in performance though I'm testing on a different environment (not Mujoco). I want to know how the modified entropy changes learning in Mujoco environment. Unfortunately, I couldn't run Mujoco because of Python version...
I have a doubt with using the entropy as well. If we use as Loss the probability density function of the gaussian with u and sigma squared estimated from the net, evaluated in the point corresponding to the executed action, we will found as its derivative with respect to sigma squared:
dL/dsigma_sq = 1/(2*sigma_sq) - (x-u)^2/(2*sigma_sq^2)
If we add to the loss also the entropy (with a minus sign), following the formula mentioned by @alexis-jacq, its derivative with respect to sigma squared would be:
d(-E)/dsigma_sq = -1/(2*sigma_sq)
which is equivalent (apart from the minus sign) to the first term of dL/dsigma_sq.
Since it is suggested to multiply the entropy by a constant factor (1e-4 in the Mnih's paper), it's seems to me that the contribution of the entropy would be very marginal..
Am I missing something? Thanks
The entropy of a Gaussian distribution is k/2 log(2 pi e) + 1/2 log(|Sigma|) according to the Wikipedia where k is the dimension of the distribution.
However, in the code, the entropy is calculated by -1/2 (log(2pi + |sigma|) + 1).
Why?