ShichenXie / scorecardpy

Scorecard Development in python, 评分卡
http://shichen.name/scorecard
MIT License
725 stars 301 forks source link

Wildcard categorical bin? #61

Closed cheitzig closed 4 years ago

cheitzig commented 4 years ago

We have a dataset of about 3mm records. We're building a model using a 700k training sample and a 300k test sample.

We're building the WoE bins based on the 700k training set, and it turns out that for a few of the categorical variables (e.g., 3-digit zipcode), there are values in the test set that aren't in the training set.

Two thoughts/questions:

ShichenXie commented 4 years ago

The categorical features that have too many unique values should be preprocessed, such as merging into some major categories. If adding a wildcard bin, the woe value of which is difficult to define.

cheitzig commented 4 years ago

Got it. Thanks.