Open hansglick opened 2 years ago
Hi @hansglick ,
That's a great question, thanks!
The KL divergence equation computes the divergence between two probability distributions (see my video on this topic). For example, if the probability of activation is 0.4 but we actually want it to be 0.1 (for sparsity), then the correct equation is:
>>> import numpy as np
>>> 0.1 * np.log(0.1 / 0.4) + (1 - 0.1) * np.log((1 - 0.1) / (1 - 0.4))
0.22628916118535888
This includes the probability of activation (0.4) and the probability of no-activation (1-0.4), since we need a full probability distribution.
Or we can use the kullback_leibler_divergence()
function from the tensorflow.keras.losses
package to get the same result as a tensor:
>>> from tensorflow.keras.losses import kullback_leibler_divergence
>>> kullback_leibler_divergence([0.1, 1-0.1], [0.4, 1-0.4])
<tf.Tensor: shape=(), dtype=float32, numpy=0.2262891>
Another way to get the same result is to call the kullback_leibler_divergence
twice, once with just the probability of activation, and once with just the probability of no-activation:
>>> kullback_leibler_divergence([0.1], [0.4]) + kullback_leibler_divergence([1-0.1], [1-0.4])
<tf.Tensor: shape=(), dtype=float32, numpy=0.2262891>
This last option was less verbose than the previous option, since it does not require concatenating the probabilities into a single tensor.
I think I'll add a note in the notebook about this, I agree it's not intuitive. Thanks again!
@ageron Thank you Sir for your great explanation and your time.
Hi @ageron ,
cell n°44 @ https://github.com/ageron/handson-ml2/blob/master/17_autoencoders_and_gans.ipynb, you build a KLDivergence Layer, but the formula you use is little difficult to understand at least for me,
Why
?
and not simply
kl_divergence(self.target, mean_activities)
?