Open alexpghayes opened 6 years ago
Part of this: a vtreat
replacement built out of recipes
steps.
Univariate filters are one starting point.
What do multivariate filters look like? How to search through feature interactions?
Potentially interesting univariate filters:
Additionally, worth looking into approaches like partition retention (note this is currently limited to categorical variables with few levels).
Some multivariate filters:
Part of this: a
vtreat
replacement built out ofrecipes
steps.
vtreat author here. Why not just use vtreat in a recipe? Replacing vtreat seems a bit needlessly cruel.
Replacing
vtreat
seems a bit needlessly cruel.
The steps in embed
are separate implementations of one aspect of vtreat
and are not meant to replace anything. You have several alternate implementations of tidyverse functions; this is no different.
If you want, go the same route as embed
and make recipes steps in a separate package that people can use. That's probably the general route that we will go in the future (see the textrecipes
package as well). That can fill the technical gap of using vtreat
with the tidymodels infrastructure.
Otherwise, your behavior on social media does not lends itself to me wanting to collaborate with you (simply as a matter of trust).
[edit] gammer
This is an personal repository of scratch work that I used to develop some ideas about modeling interfaces. As a now-retired thought experiment, I'd ask that discussions of vtreat
and recipes
occur elsewhere.
There are a number of supervised processing steps that fall outside the scope of
recipes
. Several of these appear in the WinVector packagevtreat
.How should these steps fit into a modelling workflow? As a canonical example, consider feature selection by GLM p-values on univariate models.
This is a preprocessing step that with data leakage issues, so it should get it's own portion of the training data.
Related: in some cases you might want to do brute force comparisons of many sets of features. How and when might a
feature_set
object be useful?