Your work is very interesting, but I still have some confusion about the calculation of KL divergence.
In the paper, is Appendix E calculating the prior distribution of 𝑍, i.e., 𝑃(𝑍)? Since it involves λ(𝑋) and 𝑅, we need to use 𝑄(𝑍) for variational approximation?
However, in the KL divergence calculation, when calculating 𝑃(𝑍∣𝑅), λ and 𝑅 are assumed to be constants. Why can λ and 𝑅 be considered constants? Is it because λ and 𝑅 are deterministic outputs given 𝑋? Do we not need to consider the distribution of
λ?
Your work is very interesting, but I still have some confusion about the calculation of KL divergence.
In the paper, is Appendix E calculating the prior distribution of 𝑍, i.e., 𝑃(𝑍)? Since it involves λ(𝑋) and 𝑅, we need to use 𝑄(𝑍) for variational approximation?
However, in the KL divergence calculation, when calculating 𝑃(𝑍∣𝑅), λ and 𝑅 are assumed to be constants. Why can λ and 𝑅 be considered constants? Is it because λ and 𝑅 are deterministic outputs given 𝑋? Do we not need to consider the distribution of λ?