Closed deZakelijke closed 3 years ago
This will probably be the invertible mapping from higher to lower dimensions, discussed in https://github.com/deZakelijke/causal_effect_inference_with_normalizing_flows/issues/18
Solving that problem is a significant contribution on its own
So it seems this will take another form. The idea in #18 had other problems so that was scrapped. A possible idea is to have a two-headed flow with the intervention as a context sort of thing.
Right now there are two directions that are being explored. The first one is discussed in #19 and #20 and is based on the coupling layers of the RealNVP. The major downside of it is that it isn't well suited for low dimensional data, which the outcome variable always is. The advantage is that the functions/neural networks that compute the translation and scaling of the coupling layers are unrestricted. This means the intervention variable can be used in some arbitrary way that doesn't restrict us. The second option we are considering is based on Gaussianisation Flows, as discussed in #26. It is a normalising flow type that even works on one-dimensional data. The downside is that there is not yet an obvious way to incorporate the intervention variable. We can't use a multi-headed architecture directly, as we are using a continuous intervention.
An idea to overcome that is to separate the length and direction of the intervention vector, but that is more of a hunch than an actual idea.
In either case, we have the higher-level problem of how we want the variables to be connected. The very first idea had two flows. One to infer z from all our observed variables and one to do the intervention. This ran into the dimensionality problem as described in #18. The second idea was to have one flow that only has x and y as input on one side. The intervention variable was not seen as a regular input but as a 'context'. This wasn't ideal either. But maybe the first idea can be rephrased in a useful way. What if we keep two flows, but don't take all variables as input for the inference step. It seems like discarding information, but during test time we don't have y and t anyway. All information is taken from x. The problem remains that in the intervention half of the model we want to predict a scalar but we have a high-dimensional input. We can of course just project it down to lower dimensions but that returns our problem that normalising flows and dimension mismatches don't work together. It is also an option to have the intervention part not to be a conventional NF. We could train it in a conventional supervised way, just as the decoder part of the CEVAE, and possibly even change the MLP with a dimensionality reduction layer followed by Gaussianisation Flow layers.
To make an actual contribution I have to design a new component of a Normalising Flow architecture that makes it a 'causal flow'. To do that I have identify an issue with the simple extension of the CEVAE and the CEVAE itself and see how that can be improved on.