amphibian-dev / toad

ESC Team's credit scorecard tools.
https://toad.readthedocs.io
MIT License
491 stars 176 forks source link

Add Binning method to ensure monotonicity for continuous features #150

Open RunnerupWang opened 2 months ago

RunnerupWang commented 2 months ago

The existing binning method, such as chi-square, decision tree, quantile, cannot guarantee monotonicity for continuous features. While for a scorecard in commercial use, we usually require interpretability, and monotonicity is needed. Here, I suggest adding monotonicity for the existing binning methods, especially for quantile binning method. Look forward to your reply, thanks a lot.

RunnerupWang commented 2 months ago

After a few hours of work, I have developed a function to merge initial bins to ensure monotonicity, see the code and examples below.

Hopefully to get comments from industry peers. If the ESC team could consider optimize this functionality ,and adding to the later version, that will be a great pleasure for me.

`#!/usr/bin/env python3

-- coding: utf-8 --

""" @author: runnerup Wang """

import pandas as pd

def bin_monotonic(table,feature,direction): """ Merge Adjacent Groups to ensure monotonicity

Parameters:

Ex1: Construct a simple DataSet to test functionality

table = pd.DataFrame({'A':list(range(11)) , 'total':[2437,20720,16813,12679,5647,8232,5445,5276,5432,3514,4681], 'bad':[41,442,366,265,106,152,106,76,76,43,44]}) ex1_dict,ex1_table = bin_monotonic(table,'A',-0.05)

Ex2:

import pandas as pd import numpy as np import toad pd.set_option('display.max_columns',None) pd.set_option('display.max_rows',None)

data = pd.read_csv('/test_data.csv') print('Shape:',data.shape) data.head(10) train = data[:300] OOT = data[300:500]

c = toad.transform.Combiner() c.fit(train_selected.drop(to_drop, axis=1), y = 'target', method = 'quantile') bin_ori = c.export()

Adjusting the precision of split points

bin_adj= bin_ori for k,v in bin_ori.items(): v = [round(i,2) for i in v] v = list(dict.fromkeys(v)) bin_adj[k] = v

c.update(bin_adj)

Visualize Binning Plot

from toad.plot import bin_plot col = 'A' bin_plot(c.transform(train_selected[[col,'target']], labels=True), x=col, target='target')

Merge Bins

from toad.stats import IV, feature_bin_stats from scipy.stats import spearmanr df_temp = c.transform(train_selected[[col,'target']],labels=False) corr = spearmanr(df_temp[col], df_temp['target'])[0] table = feature_bin_stats(df_temp,col, 'target') ex2_dict,ex2_table = bin_monotonic(table,'A',corr)

Position list for Split Point

pos_list = list(set(ex2_dict.values()))

Find the corresponding Split Point

split_list = bin_adj[col] split_list_merge = [split_list[i] for i in pos_list if i < len(split_list)]

Update the rule

rule = {col:split_list_merge} c.update(rule)

bin_plot(c.transform(train_selected[[col,'target']], labels=True), x=col, target='target')

`