EmilHvitfeldt / extrasteps

More Steps for the 'recipes' Package
https://emilhvitfeldt.github.io/extrasteps/
Other
6 stars 1 forks source link

step_lump_* steps #34

Open EmilHvitfeldt opened 2 years ago

EmilHvitfeldt commented 2 years ago

Way to combine levels in factor variables. I will henceforth be calling this action lumping.

There are two main ways we can go about doing this:

Target agnostic

Target based

References

https://arxiv.org/pdf/2104.00629.pdf

EmilHvitfeldt commented 2 years ago

Think a lot about this should work on characters/factors. How should factor levels be handled

EmilHvitfeldt commented 2 years ago

think about keyword "lump/collapse"

EmilHvitfeldt commented 2 years ago

use fct_collapse() internally to update the levels

library(forcats)

dict <- list(missing = c("No answer", "Don't know"),
             other = "Other party",
             rep = c("Strong republican", "Not str republican"),
             ind = c("Ind,near rep", "Independent", "Ind,near dem"),
             dem = c("Not str democrat", "Strong democrat"))

fct_collapse(gss_cat$partyid, !!!dict)