Closed ogencoglu closed 3 months ago
Hi @ogencoglu !
If I understand you correctly, the approach your are describing is indeed a special case of a MetaLearner: the T-Learner. In its more general form it involves two arbitrary models -- not necessarily NNs and not necessarily NNs sharing weights up to the last layer of a NN. We wrote up a thing or two about the T-Learner in our docs.
I'm not sure there's right or wrong here. My understanding would have been that supervised learning comes with a learning/training dataset $\mathcal{D} = \{x_i, y_i\}_i$ where the $y_i$ are
In CATE estimation, we do observe some outcomes that we tend to call $y$, too. Yet these outcomes are not truly the quantity of interest $\tau$. These quantities of interest, $\tau$, we never observe. In other words we don't have labels for what we actually care about.
Does that make it more clear why we chose this formulation?
At the end of the day it boils down to a matter of defintions. Afaict there isn't a common, yet explicit definition of what supervised learning really is. For instance, Bishop somewhat informally writes
Applications in which the training data comprises examples of the input vectors along with their corresponding target vectors are known as supervised learning problems
while Murphy writes
The most common form of ML is supervised learning. In this problem, the task T is to learn a mapping f from inputs x ∈ X to outputs y ∈ Y.
Note that, e.g., the latter definition/description is so lax that, strictly speaking, it would even allow for unsupervised learning to be considered supervised learning. :)
Thanks for the swift reply!
My neural net formulation does not involve two separate models or 2 neural nets sharing weights. It is a single model, learning that table in the posted image. A single neural network with two outputs. Once you train the model, you perform a forward pass (inference) and predict the 2 numerical values and $\tau$ is the difference. I don't think this is T-learner.
I get your point and I agree that this comes down to semantics. I just thought "you can not use regular supervised learning with a single model to solve this problem" claim in your docs quite bold. After all, the formulation I gave is not anything exotic. Just a neural net with masked loss.
I will close this now and once again thanks for the discussion.
This is not an issue or bug but there was no Discussions sections so I am asking away.
Let's start from the example in your docs:
Why can't I train a multi-output (2 in this specific example) neural network as a regressor and mask the loss for the missing targets? Such masking is quite standard practice in all sorts of neural network use cases, e.g. when time series signals have different lengths etc.
So, here, CATE estimation exactly is supervised learning.