Concept question on MC samping and IW samping

Correct me if I misread the paper. For IW samping, the paper is using it to approximate the L(x) For MC samping, I am not entirely sure about the line "The KL[qφ|pθ] is calculated analytically at each layer when possible and otherwise approximated using Monte Carlo samplin" (page [3] just above sec 2.1)(https://arxiv.org/pdf/1602.02282.pdf). Since the KL divergence is Gaussian and thus analytical, why do we need MC sampling? Could you clarify that?

casperkaae / LVAE

Concept question on MC samping and IW samping #3