On this thread @darrencl asks how one should implement a feature selection "transformer" that uses a "target". His use case is feature selection based on strong correlation with the target.
One could implement such a model as Supervised with a transform method; this is allowed. (One could not use predict to output transformations, as the API assumes the output of predict has the same scitype as the target y passed to fit (or is a probablistic version of same). However, the method then has no predict method, which would be unexpected, although possibly not impossible to live with.
An alternative that I recommend is to implement the model as Unsupervised with the "target" being bundled with the other features X by the model user. The model has a hyper-parameter specifiying which feature is the comparison_feature (aka "target"). Then there is no need for a predict method. The only drawback is that the "target" feature itself needs to be added to the input, which would mean adding an element to the overall pipeline. But I don't think that's a big deal. Whether the comparison_feature is itself retained as part of the output could be specified by a second hyperparameter drop_comparison_variable or whatever.
I think my second suggestion above deals with the question posed, and in the absence of other input, I am now closing. Feel free to continue the discussion here or re-open.
On this thread @darrencl asks how one should implement a feature selection "transformer" that uses a "target". His use case is feature selection based on strong correlation with the target.
One could implement such a model as
Supervised
with atransform
method; this is allowed. (One could not usepredict
to output transformations, as the API assumes the output ofpredict
has the same scitype as the targety
passed tofit
(or is a probablistic version of same). However, the method then has nopredict
method, which would be unexpected, although possibly not impossible to live with.An alternative that I recommend is to implement the model as
Unsupervised
with the "target" being bundled with the other featuresX
by the model user. The model has a hyper-parameter specifiying which feature is thecomparison_feature
(aka "target"). Then there is no need for apredict
method. The only drawback is that the "target" feature itself needs to be added to the input, which would mean adding an element to the overall pipeline. But I don't think that's a big deal. Whether thecomparison_feature
is itself retained as part of the output could be specified by a second hyperparameterdrop_comparison_variable
or whatever.Are there any other suggestions?