Understanding inference encoder

jgamper commented 3 years ago

https://github.com/OATML/ucate/blob/f57e47b1baec802c955adc7b07317d8014e50d91/ucate/library/models/cevae.py#L110-L186 @anndvision could you please help me understand the encoder in cevae. If understand correctly; there supposed to be three inference models:

    t ~ q(t|x)      # treatment
    y ~ q(y|t,x)    # outcome
    z ~ q(z|y,t,x)  # latent confounder, an embedding

In ucate code it seems that there is just a single inference model encoder that takes x and t only, which are passed as a concatenated array into the encoder network: https://github.com/OATML/ucate/blob/f57e47b1baec802c955adc7b07317d8014e50d91/ucate/library/models/cevae.py#L184

anndvision commented 3 years ago

Happy to give a full answer after the ICML deadline :)

jgamper commented 3 years ago

No worries! Good luck!

jgamper commented 3 years ago

hey @anndvision! Could you please elaborate on the above when you will have a chance 🚀 🤣

anndvision commented 3 years ago

Hey @jgamper! Sorry about the delay.

There are three reasons why we went for a simpler architecture in the encoder.

First, the generative model (Figure 1. in the CEVAE paper, or the decoder network) only requires the inferred z variable.

Second, we found that there is something strange going on if you infer z from t ~ q(t|x). Namely, when such a t is different from the factual t during training. Then, you evaluate Equation (6 CEVAE) with densities generated from an infered z given the counterfactual t. This resulted in the CEVAE effectively making random predictions, which I believe is backed up by the performance they report in the paper. So, we began conditioning z on only inferred values, which led to sensible predictions.

Finally, we found that the auxilliary losses (Equation 10. CEVAE) did not improve performance. Both under the original setup, and in our modified setup.

OATML / ucate

Understanding inference encoder #5