Quantco / metalearners

MetaLearners for CATE estimation
https://metalearners.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
34 stars 4 forks source link

Challenging "CATE estimation is not supervised learning" #77

Closed ogencoglu closed 3 months ago

ogencoglu commented 3 months ago

This is not an issue or bug but there was no Discussions sections so I am asking away.

Let's start from the example in your docs:

image

Why can't I train a multi-output (2 in this specific example) neural network as a regressor and mask the loss for the missing targets? Such masking is quite standard practice in all sorts of neural network use cases, e.g. when time series signals have different lengths etc.

So, here, CATE estimation exactly is supervised learning.

kklein commented 3 months ago

Hi @ogencoglu !

If I understand you correctly, the approach your are describing is indeed a special case of a MetaLearner: the T-Learner. In its more general form it involves two arbitrary models -- not necessarily NNs and not necessarily NNs sharing weights up to the last layer of a NN. We wrote up a thing or two about the T-Learner in our docs.

I'm not sure there's right or wrong here. My understanding would have been that supervised learning comes with a learning/training dataset $\mathcal{D} = \{x_i, y_i\}_i$ where the $y_i$ are

In CATE estimation, we do observe some outcomes that we tend to call $y$, too. Yet these outcomes are not truly the quantity of interest $\tau$. These quantities of interest, $\tau$, we never observe. In other words we don't have labels for what we actually care about.

Does that make it more clear why we chose this formulation?

At the end of the day it boils down to a matter of defintions. Afaict there isn't a common, yet explicit definition of what supervised learning really is. For instance, Bishop somewhat informally writes

Applications in which the training data comprises examples of the input vectors along with their corresponding target vectors are known as supervised learning problems

while Murphy writes

The most common form of ML is supervised learning. In this problem, the task T is to learn a mapping f from inputs x ∈ X to outputs y ∈ Y.

Note that, e.g., the latter definition/description is so lax that, strictly speaking, it would even allow for unsupervised learning to be considered supervised learning. :)

ogencoglu commented 3 months ago

Thanks for the swift reply!

My neural net formulation does not involve two separate models or 2 neural nets sharing weights. It is a single model, learning that table in the posted image. A single neural network with two outputs. Once you train the model, you perform a forward pass (inference) and predict the 2 numerical values and $\tau$ is the difference. I don't think this is T-learner.

I get your point and I agree that this comes down to semantics. I just thought "you can not use regular supervised learning with a single model to solve this problem" claim in your docs quite bold. After all, the formulation I gave is not anything exotic. Just a neural net with masked loss.

I will close this now and once again thanks for the discussion.