ShichenXie / scorecardpy

Scorecard Development in python, 评分卡
http://shichen.name/scorecard
MIT License
725 stars 301 forks source link

MergeError when executing "woebin" function #70

Open kendalvictor opened 3 years ago

kendalvictor commented 3 years ago

Hi, image few days ago after updating the PANDAS library to version 1.2.0, the "woebin" function of scorerapy version '0.1.9.2' stopped working.

When trying to execute it, the error is seen:


MergeError Traceback (most recent call last)

in ----> 1 cortes = sc.woebin( 2 data[ 3 (data[col_target].notnull()) 4 ].drop( 5 [col for col in data.columns if 'target' in col and col != col_target] + col_no_review, C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in woebin(dt, y, x, var_skip, breaks_list, special_values, stop_limit, count_distr_limit, bin_num_limit, positive, no_cores, print_step, method, ignore_const_cols, ignore_datetime_cols, check_cate_num, replace_blank, save_breaks_list, **kwargs) 956 print(('{:'+str(len(str(xs_len)))+'.0f}/{} {}').format(i, xs_len, x_i), flush=True) 957 # woebining on one variable --> 958 bins[x_i] = woebin2( 959 dtm = pd.DataFrame({'y':dt[y], 'variable':x_i, 'value':dt[x_i]}), 960 breaks=breaks_list[x_i] if (breaks_list is not None) and (x_i in breaks_list.keys()) else None, C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in woebin2(dtm, breaks, spl_val, init_count_distr, count_distr_limit, stop_limit, bin_num_limit, method) 720 if method == 'tree': 721 # 2.tree-like optimal binning --> 722 bin_list = woebin2_tree( 723 dtm, init_count_distr=init_count_distr, count_distr_limit=count_distr_limit, 724 stop_limit=stop_limit, bin_num_limit=bin_num_limit, breaks=breaks, spl_val=spl_val) C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in woebin2_tree(dtm, init_count_distr, count_distr_limit, stop_limit, bin_num_limit, breaks, spl_val) 482 ''' 483 # initial binning --> 484 bin_list = woebin2_init_bin(dtm, init_count_distr=init_count_distr, breaks=breaks, spl_val=spl_val) 485 initial_binning = bin_list['initial_binning'] 486 binning_sv = bin_list['binning_sv'] C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in woebin2_init_bin(dtm, init_count_distr, breaks, spl_val) 274 275 # dtm $ binning_sv --> 276 dtm_binsv_list = dtm_binning_sv(dtm, breaks, spl_val) 277 dtm = dtm_binsv_list['dtm'] 278 binning_sv = dtm_binsv_list['binning_sv'] C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in dtm_binning_sv(dtm, breaks, spl_val) 113 # sv_df = sv_df.assign(value = lambda x: x.value.astype(dtm['value'].dtypes)) 114 # dtm_sv & dtm --> 115 dtm_sv = pd.merge(dtm.fillna("missing"), sv_df[['value']].fillna("missing"), how='inner', on='value', right_index=True) 116 dtm = dtm[~dtm.index.isin(dtm_sv.index)].reset_index() if len(dtm_sv.index) < len(dtm.index) else None 117 # dtm_sv = dtm.query('value in {}'.format(sv_df['value'].tolist())) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate) 72 validate=None, 73 ) -> "DataFrame": ---> 74 op = _MergeOperation( 75 left, 76 right, C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate) 648 warnings.warn(msg, UserWarning) 649 --> 650 self._validate_specification() 651 652 cross_col = None C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in _validate_specification(self) 1301 ) 1302 if self.left_index or self.right_index: -> 1303 raise MergeError( 1304 'Can only pass argument "on" OR "left_index" ' 1305 'and "right_index", not a combination of both.' MergeError: Can only pass argument "on" OR "left_index" and "right_index", not a combination of both. ![image](https://user-images.githubusercontent.com/17172507/104403542-bc3a7d00-5526-11eb-9259-05f8f1b6289a.png)
Okroshiashvili commented 3 years ago

@kendalvictor I think you have to downgrade Pandas at 0.25.0. But, before you downgrade, in Pandas merge() method either indicate on argument or only left_index and right_index not both of them. Here, you try to merge using column value as well as merge on index simultaneously. I hope this helps

kendalvictor commented 3 years ago

Hi @Okroshiashvili the solution was to lower the version of pandas to 1.1.3, but ideally, this error should be taken into consideration for a version of this library since currently its "woebin" function does not work in version 1.2.0 of pandas

Okroshiashvili commented 3 years ago

I think it's not surprising to have version incompatibility. I hope maintainers will solve this problem but until then if your problem is solved, please close this issue :)

kendalvictor commented 3 years ago

Solved after pandas library version change from 1.2.0 to 1.1.3

ShichenXie commented 3 years ago

The bug should be fixed. Please check the latest version on the Github.

chenz1hao commented 3 years ago

but the problem still till now. image

FairmoneyKunal commented 2 years ago

I am still having problem while using with pandas 1.3.4, do we have any new work around?

ShichenXie commented 2 years ago

Please install the latest version on GitHub and try again. It should be fixed.

VladOnMyOwn commented 1 year ago

I have the same problem with pandas 1.5.3. 2023-02-13_00h01_33