Create function to reduce the number of categorical variables created
We can do something like, if there are less than 10 observations from that category, we can re group those observations.
A potential workflow can be
Group by for each level in the variable
Get the count for each level in that category
3, Create new level that will be for levels with less than x counts. We can set x to be 10 or the 10th decile or whichever is lower
Fill new level appropriately
This is just an idea for a general data cleaning step
Create function to reduce the number of categorical variables created
We can do something like, if there are less than 10 observations from that category, we can re group those observations.
A potential workflow can be
This is just an idea for a general data cleaning step