Closed hanzigs closed 5 years ago
It should be work.
I also face the same problem in python. When I run single record it shows error like this "None of [Index(['XXX'], dtype='object')] are in the [columns]". Seems it cannot find the index.
But when I run two records, it works.
Actually it's not working, doesn't understand meaning of should work, there is also another package 'creditR' does the same job.
I have already fixed it. Use the version below.
def scorecard_ply(dt, card, only_total_score=True, print_step=0):
dt = dt.copy(deep=True)
# remove date/time col
#dt = rmcol_datetime_unique1(dt) #It doesn't work properly. Remove by TSP
# replace "" by NA
dt = rep_blank_na(dt)
# print_step
print_step = check_print_step(print_step)
# card # if (is.list(card)) rbindlist(card)
if isinstance(card, dict):
card_df = pd.concat(card, ignore_index=True)
# x variables
xs = card_df.loc[card_df.variable != 'basepoints', 'variable'].unique()
# length of x variables
xs_len = len(xs)
# initial datasets
dat = dt.loc[:,list(set(dt.columns)-set(xs))]
# loop on x variables
for i in np.arange(xs_len):
x_i = xs[i]
if print_step>0 and bool((i+1)%print_step):
print(('{:'+str(len(str(xs_len)))+'.0f}/{} {}').format(i, xs_len, x_i))
cardx = card_df.loc[card_df['variable']==x_i]
# score transformation
dtx_points = woepoints_ply1(dt, cardx, x_i, woe_points="points")
dat = pd.concat([dat, dtx_points], axis=1)
# set basepoints
card_basepoints = list(card_df.loc[card_df['variable']=='basepoints','points'])[0] if 'basepoints' in card_df['variable'].unique() else 0
# total score
dat_score = dat[xs+'_points']
dat_score.loc[:,'score'] = card_basepoints + dat_score.sum(axis=1)
# dat_score = dat_score.assign(score = lambda x: card_basepoints + dat_score.sum(axis=1))
# return
if only_total_score: dat_score = dat_score[['score']]
return dat_score
Awesome, Thank you very much for your efforts
Hi, Thanks for the new version function def scorecard_ply(dt, card, only_total_score=True, print_step=0):
the package is not updated, am I correct, if I do pip install scorecardpy pip install git+git:// both these are not updated versions, as with this, it is still not working
Hi @ShichenXie @Toon6115 is it possible to raise a PR for the new code and merge. Thanks
I have updated the package from github.
Hi @ShichenXie I installed from github, below is the summary
Name: scorecardpy
Summary: Credit Risk Scorecard
Author: Shichen Xie
License: UNKNOWN
Location: c:\programdata\anaconda3\lib\site-packages
Requires: scikit-learn, numpy, pandas, matplotlib
Note: you may need to restart the kernel to use updated packages.
Still i am getting the Error as
data_woe = scorecardpy.woebin_ply(test_data, Training_Bins)
[INFO] converting into woe values ...
C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\ UserWarning: There are 57 columns have only one unique values, which are removed from input dataset.
(ColumnNames: .............................................)
warnings.warn("There are {} columns have only one unique values, which are removed from input dataset. \n (ColumnNames: {})".format(len(unique1_cols), ', '.join(unique1_cols)))
Sorry for the late reply.
Please update the package from GitHub again. The bug should be fixed. If you still have this issue, please provide a reproducible example.
Is it possible to apply the sc.woebin_ply function for a SINGLE test record using the train bins, it works in R, in python data frame becomes empty saying unique values, in R it converts based on the training bins.
train_bins = sc.woebin(training_data, y="Target", breaks_list=breaks_adj) test_data = sc.woebin_ply(test_data, train_bins)