Missing reference in https://deepgenerativemodels.github.io/notes/vae/

In https://deepgenerativemodels.github.io/notes/vae/, paragraph

Learning Directed Latent Variable Models

states that

As we have seen previously, optimizing an empirical estimate of the KL divergence is equivalent to maximizing the marginal log-likelihood logp(x) over $D$

This isn't mentioned anywere in the rest of the course notes, . It would be useful for the learner to add the proof of this equivalence, or at least a reference to it.

deepgenerativemodels / notes

Missing reference in https://deepgenerativemodels.github.io/notes/vae/ #12

Learning Directed Latent Variable Models