Quantco / metalearners

MetaLearners for CATE estimation
https://metalearners.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
34 stars 4 forks source link

Leakage in X-Learner in-sample prediction #80

Open kklein opened 3 months ago

kklein commented 3 months ago

Issue at hand

@ArseniyZvyagintsevQC brought the following to our attention:

Let us assume a binary treatment variant scenario in which we want to work with in-sample predictions, i.e. is_oos=False.

The current implementation would go about fitting five models, three of which considered nuisance models and two of which considered treatment models:

model target cross-fitting dataset stage name
$\hat{\mu}_0$ $Y_i$ $\{(X_i, Y_i) | W_i=0\}$ nuisance "treatment_variant"
$\hat{\mu}_1$ $Y_i$ $\{(X_i, Y_i) | W_i=1\}$ nuisance "treatment_variant"
$\hat{e}$ $W_i$ $\{(X_i, Y_i)\}$ nuisance/propensity "propensity_model"
$\hat{\tau}_0$ $\hat{\mu}(X_i) - Y_0$ $\{(X_i, Y_i) | W_i=0\}$ treatment "control_effect_model"
$\hat{\tau}_1$ $Y_i - \hat{\mu}(X_i)$ $\{(X_i, Y_i) | W_i=1\}$ treatment "treatment_effect_model"

More background on this here.

Note that each of these models is cross-fitted. More precisely, each is cross-fitted wrt the data it has seen at training time.

Let's suppose now that we are at inference time and encounter an in-sample data point $i$. Wlog, let's assume that $W_i=1$. In order to come up with a CATE estimate, the predict method will run

The latter call makes sure we avoid leakage in $\hat{\tau}_1$. The former call, however, does not completely avoid leakage: even though $i$ hasn't been seen in the training of $\hat{\tau}_0$, it has been seen in $\hat{\mu}_1$, which is, in turn, used by $\hat{\tau}_0$. Therefore, the observed outcome $Y_i$ can leak into the estimate $\hat{\tau}(X_i)$.

Next steps

We can devise an extreme, naïve approach to counteract this issue by training every type of model once per datapoint. Clearly, this ensures the absence of data leakage. The challenge with this issue revolves around coming up with a design that

kklein commented 3 months ago

Preliminary idea

Currently we are training

In order to answer an in-sample query of

Give me models $\hat{\tau}_{0,k}$ and $\hat{\tau}_{k,0}$ which have seen no information about sample $i$ at all

We could train

In the scenario described in the issue, we would then run the predict method as such:

Importantly, this would