ShichenXie / scorecardpy

Scorecard Development in python, 评分卡
http://shichen.name/scorecard
MIT License
725 stars 301 forks source link

Fix woebin bins special values #77

Closed foookinaaa closed 3 years ago

foookinaaa commented 3 years ago

I found that if you pass all the unique values from a column to special values, an error occurs. Below I posted an example that reproduces this error:

import pandas as pd
import scorecardpy as sc

special_values = {
    'col1' : [1,2,3],
    'col2' : [1,2]
}
data = pd.DataFrame({
    'col1' : [1, 1, 1, 2, 3, 2, 3],
    'col2' : [1, 3, 2, 2, 1, 1, 3],
    'target' : [0, 0, 0, 1, 1, 1, 0]
})

sc.woebin(data,
         y = 'target',
         special_values=special_values,
         count_distr_limit=0.10,
         bin_num_limit=3,
         save_breaks_list='intl')

Output:

[INFO] creating woe binning ...
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_15504/2373631080.py in <module>
----> 1 sc.woebin(data,
      2          y = 'target',
      3          special_values=special_values,
      4          count_distr_limit=0.10,
      5          bin_num_limit=3,

~/foka/venv/lib/python3.8/site-packages/scorecardpy/woebin.py in woebin(dt, y, x, var_skip, breaks_list, special_values, stop_limit, count_distr_limit, bin_num_limit, positive, no_cores, print_step, method, ignore_const_cols, ignore_datetime_cols, check_cate_num, replace_blank, save_breaks_list, **kwargs)
    962                 print(('{:'+str(len(str(xs_len)))+'.0f}/{} {}').format(i, xs_len, x_i), flush=True)
    963             # woebining on one variable
--> 964             bins[x_i] = woebin2(
    965               dtm = pd.DataFrame({'y':dt[y], 'variable':x_i, 'value':dt[x_i]}),
    966               breaks=breaks_list[x_i] if (breaks_list is not None) and (x_i in breaks_list.keys()) else None,

~/foka/venv/lib/python3.8/site-packages/scorecardpy/woebin.py in woebin2(dtm, breaks, spl_val, init_count_distr, count_distr_limit, stop_limit, bin_num_limit, method)
    726             if method == 'tree':
    727                 # 2.tree-like optimal binning
--> 728                 bin_list = woebin2_tree(
    729                   dtm, init_count_distr=init_count_distr, count_distr_limit=count_distr_limit,
    730                   stop_limit=stop_limit, bin_num_limit=bin_num_limit, breaks=breaks, spl_val=spl_val)

~/foka/venv/lib/python3.8/site-packages/scorecardpy/woebin.py in woebin2_tree(dtm, init_count_distr, count_distr_limit, stop_limit, bin_num_limit, breaks, spl_val)
    491     initial_binning = bin_list['initial_binning']
    492     binning_sv = bin_list['binning_sv']
--> 493     if len(initial_binning.index)==1:
    494         return {'binning_sv':binning_sv, 'binning':initial_binning}
    495     # initialize parameters

AttributeError: 'NoneType' object has no attribute 'index'

I also fixed the woepoints_ply1 function, which had a problem with masking int values.