Hi, thank you for this interesting work! I was trying to read your proof in appendix E2, and got confused about the design of encoder networks. Ideally, if the decoder network is linear, i.e. f_\mux(z) = Az + b, the true posterior is also gaussian with mean (I\gamma + A^TA)^{-1}A^T(x-b), which is related with \gamma. However, the mean of the variational posterior in this paper is f\mu_z(x) which is independent of \gamma. Is there anything wrong? I was trying to figure this out in the following proof, but couldn't understand why equation (30) holds. Since the variable transformation is z' = (z - z*) / \sqrt(\gamma), then shouldn't z' be somehow related with \gamma? If so, why z' can be canceled out in the second term of the second equality when taking the limit \gamma goes to \intfy?
Hi, thank you for this interesting work! I was trying to read your proof in appendix E2, and got confused about the design of encoder networks. Ideally, if the decoder network is linear, i.e. f_\mux(z) = Az + b, the true posterior is also gaussian with mean (I\gamma + A^TA)^{-1}A^T(x-b), which is related with \gamma. However, the mean of the variational posterior in this paper is f\mu_z(x) which is independent of \gamma. Is there anything wrong? I was trying to figure this out in the following proof, but couldn't understand why equation (30) holds. Since the variable transformation is z' = (z - z*) / \sqrt(\gamma), then shouldn't z' be somehow related with \gamma? If so, why z' can be canceled out in the second term of the second equality when taking the limit \gamma goes to \intfy?