andrewliao11 / pytorch-a3c-mujoco

Implement A3C for Mujoco gym envs
MIT License
73 stars 19 forks source link

Problem with the entropy #2

Open xuehy opened 7 years ago

xuehy commented 7 years ago

The entropy of a Gaussian distribution is k/2 log(2 pi e) + 1/2 log(|Sigma|) according to the Wikipedia where k is the dimension of the distribution.

However, in the code, the entropy is calculated by -1/2 (log(2pi + |sigma|) + 1).

Why?

alexis-jacq commented 7 years ago

it's a multidimensional normal distribution with a spherical covariance: you have Nd_Sigma = 1d_sigma^2 * Nd_Identity. so |Nd_Sigma| = 1d_sigma^(2*k)

then, k/2 * log(2 * pi * e) + 1/2 * log(|Sigma|) = k/2 * (log(2*pi*e*1d_sigma^2)) = k/2 * (log(2*pi*1d_sigma^2) + 1)

After, you can divide by k to reduce the weight of the entropy loss. the -1 is to make the quantity positive, so the gradient descent will make it close to 0.

But you are right, there is still a typo. it should be entropy = -0.5*((sigma_sq x 2*pi.expand_as(sigma_sq)).log()+1) instead of entropy = -0.5*((sigma_sq + 2*pi.expand_as(sigma_sq)).log()+1)

kkjh0723 commented 7 years ago

@alexis-jacq @xuehy Have you tried the modified entropy? I also found that the original entropy calculation seems wrong, and changed as @alexis-jacq one. But It seems the original one looks better in performance though I'm testing on a different environment (not Mujoco). I want to know how the modified entropy changes learning in Mujoco environment. Unfortunately, I couldn't run Mujoco because of Python version...

giubacchio commented 6 years ago

I have a doubt with using the entropy as well. If we use as Loss the probability density function of the gaussian with u and sigma squared estimated from the net, evaluated in the point corresponding to the executed action, we will found as its derivative with respect to sigma squared: dL/dsigma_sq = 1/(2*sigma_sq) - (x-u)^2/(2*sigma_sq^2)

If we add to the loss also the entropy (with a minus sign), following the formula mentioned by @alexis-jacq, its derivative with respect to sigma squared would be: d(-E)/dsigma_sq = -1/(2*sigma_sq) which is equivalent (apart from the minus sign) to the first term of dL/dsigma_sq.

Since it is suggested to multiply the entropy by a constant factor (1e-4 in the Mnih's paper), it's seems to me that the contribution of the entropy would be very marginal..

Am I missing something? Thanks