alexpghayes / modelling-in-r

an initial attempt to describe a grammar of modelling for r
https://alexpghayes.github.io/modelling-in-r/
3 stars 0 forks source link

Supervised preprocessing #16

Open alexpghayes opened 6 years ago

alexpghayes commented 6 years ago

There are a number of supervised processing steps that fall outside the scope of recipes. Several of these appear in the WinVector package vtreat.

How should these steps fit into a modelling workflow? As a canonical example, consider feature selection by GLM p-values on univariate models.

This is a preprocessing step that with data leakage issues, so it should get it's own portion of the training data.

Related: in some cases you might want to do brute force comparisons of many sets of features. How and when might a feature_set object be useful?

alexpghayes commented 6 years ago

Part of this: a vtreat replacement built out of recipes steps.

alexpghayes commented 6 years ago

Univariate filters are one starting point.

What do multivariate filters look like? How to search through feature interactions?

alexpghayes commented 6 years ago

Potentially interesting univariate filters:

Additionally, worth looking into approaches like partition retention (note this is currently limited to categorical variables with few levels).

alexpghayes commented 6 years ago

Some multivariate filters:

JohnMount commented 6 years ago

Part of this: a vtreat replacement built out of recipes steps.

vtreat author here. Why not just use vtreat in a recipe? Replacing vtreat seems a bit needlessly cruel.

topepo commented 6 years ago

Replacing vtreat seems a bit needlessly cruel.

The steps in embed are separate implementations of one aspect of vtreat and are not meant to replace anything. You have several alternate implementations of tidyverse functions; this is no different.

If you want, go the same route as embed and make recipes steps in a separate package that people can use. That's probably the general route that we will go in the future (see the textrecipes package as well). That can fill the technical gap of using vtreat with the tidymodels infrastructure.

Otherwise, your behavior on social media does not lends itself to me wanting to collaborate with you (simply as a matter of trust).

[edit] gammer

alexpghayes commented 6 years ago

This is an personal repository of scratch work that I used to develop some ideas about modeling interfaces. As a now-retired thought experiment, I'd ask that discussions of vtreat and recipes occur elsewhere.