[QUESTION] KL divergence formula for the regularizer layer needs explication

hansglick commented 2 years ago

Hi @ageron ,

cell n°44 @ https://github.com/ageron/handson-ml2/blob/master/17_autoencoders_and_gans.ipynb, you build a KLDivergence Layer, but the formula you use is little difficult to understand at least for me,

Why

kl_divergence(self.target, mean_activities) +
kl_divergence(1. - self.target, 1. - mean_activities)

?

and not simply kl_divergence(self.target, mean_activities) ?

ageron commented 2 years ago

Hi @hansglick ,

That's a great question, thanks!

The KL divergence equation computes the divergence between two probability distributions (see my video on this topic). For example, if the probability of activation is 0.4 but we actually want it to be 0.1 (for sparsity), then the correct equation is:

>>> import numpy as np
>>> 0.1 * np.log(0.1 / 0.4) + (1 - 0.1) * np.log((1 - 0.1) / (1 - 0.4))
0.22628916118535888

This includes the probability of activation (0.4) and the probability of no-activation (1-0.4), since we need a full probability distribution.

Or we can use the kullback_leibler_divergence() function from the tensorflow.keras.losses package to get the same result as a tensor:

>>> from tensorflow.keras.losses import kullback_leibler_divergence
>>> kullback_leibler_divergence([0.1, 1-0.1], [0.4, 1-0.4])
<tf.Tensor: shape=(), dtype=float32, numpy=0.2262891>

Another way to get the same result is to call the kullback_leibler_divergence twice, once with just the probability of activation, and once with just the probability of no-activation:

>>> kullback_leibler_divergence([0.1], [0.4]) + kullback_leibler_divergence([1-0.1], [1-0.4])
<tf.Tensor: shape=(), dtype=float32, numpy=0.2262891>

This last option was less verbose than the previous option, since it does not require concatenating the probabilities into a single tensor.

I think I'll add a note in the notebook about this, I agree it's not intuitive. Thanks again!

hansglick commented 2 years ago

@ageron Thank you Sir for your great explanation and your time.

ageron / handson-ml2

[QUESTION] KL divergence formula for the regularizer layer needs explication #561