Your task is to predict the probability that a credit card owner will default based on his/her characteristics and payment history. This is a classification problem.
The features convey the following information:
X1: Amount of the given credit (New Taiwan dollar): it includes both the individual consumer credit and his/her family (supplementary) credit.
X2: Gender (1 = male; 2 = female).
X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
X4: Marital status (1 = married; 2 = single; 3 = others).
X5: Age (year).
X6 - X11: History of past payment (from April to September, 2005)
Where
X6 = the repayment status in September, 2005;
X7 = the repayment status in August, 2005;
...
X11 = the repayment status in April, 2005.
And the measurement scale for the repayment status is:
-1 = pay duly;
1 = payment delay for one month;
2 = payment delay for two months;
...
8 = payment delay for eight months;
9 = payment delay for nine months and above.
X12-X17: Amount of bill statement (New Taiwan dollar).
Where
X12 = amount of bill statement in September, 2005;
X13 = amount of bill statement in August, 2005;
...
X17 = amount of bill statement in April, 2005.
X18-X23: Amount of previous payment (New Taiwan dollar).
Where
X18 = amount paid in September, 2005;
X19 = amount paid in August, 2005; . . .
. . . X23 = amount paid in April, 2005.
The following convention must be adhered to when placing your solution files.
Plots:
For Scatter-plot
/machineLearning/credit_default/plots/sp/<solution_file>
For Heatmap
/machineLearning/credit_default/plots/hm/<solution_file>
For lmplot
/machineLearning/credit_default/plots/lp/<solution_file>
Algorithms:
For Support Vector Classifier: /machineLearning/credit_default/algo/svc/<solution_file>
For Logistic Regression: /machineLearning/credit_default/algo/lr/<solution_file>
For K-Nearest Neighbors:
/machineLearning/credit_default/algo/knn/<solution_file>
For Gaussian Naive Bayes:
/machineLearning/credit_default/algo/gnb/<solution_file>
For Decision Trees Classifier:
/machineLearning/credit_default/algo/dtc/<solution_file>
For Random Forest Classifier:
/machineLearning/credit_default/algo/rfc/<solution_file>
For Multi-layer Perceptron Classifier:
/machineLearning/credit_default/algo/mlp/<solution_file>
For XGBoost:
/machineLearning/credit_default/algo/xgb/<solution_file>
For LightGBM:
/machineLearning/credit_default/algo/lgbm/<solution_file>
Ensembles:
For 10-fold XGBoost: /machineLearning/credit_default/ens/10-xgb/<solution_file>
For 10-fold LightGBM:
/machineLearning/credit_default/ens/10-lgbm/<solution_file>
For average(non-weighted) of predictions of 10-fold XGBoost and 10-fold LightGBM
/machineLearning/credit_default/ens/avg/<solution_file>
Note
Please claim the issue first by commenting here before starting to work on it. Feel free to contact @amukh18 or @CinnamonRolls1 with any issues at any time.
Description
Your task is to predict the probability that a credit card owner will default based on his/her characteristics and payment history. This is a classification problem.
The dataset can be found here.
The features convey the following information: X1: Amount of the given credit (New Taiwan dollar): it includes both the individual consumer credit and his/her family (supplementary) credit. X2: Gender (1 = male; 2 = female). X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). X4: Marital status (1 = married; 2 = single; 3 = others). X5: Age (year). X6 - X11: History of past payment (from April to September, 2005) Where X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; ... X11 = the repayment status in April, 2005.
And the measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; ... 8 = payment delay for eight months; 9 = payment delay for nine months and above.
X12-X17: Amount of bill statement (New Taiwan dollar). Where X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; ... X17 = amount of bill statement in April, 2005.
X18-X23: Amount of previous payment (New Taiwan dollar). Where X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . . . . . X23 = amount paid in April, 2005.
Details
Issue requirements / progress
All algorithms and ensembles must be scores using RMSE, Logloss and Accuracy metrics. Each pull request must only fulfill one of the tasks below.
Plots:
Algorithms:
Cross-validations/Ensembles:
Resources
List of resources that might be required / helpful. Here are a few resources that may help you:
Directory Structure
The following convention must be adhered to when placing your solution files.
Plots:
/machineLearning/credit_default/plots/sp/<solution_file>
/machineLearning/credit_default/plots/hm/<solution_file>
/machineLearning/credit_default/plots/lp/<solution_file>
Algorithms:
/machineLearning/credit_default/algo/svc/<solution_file>
/machineLearning/credit_default/algo/lr/<solution_file>
/machineLearning/credit_default/algo/knn/<solution_file>
/machineLearning/credit_default/algo/gnb/<solution_file>
/machineLearning/credit_default/algo/dtc/<solution_file>
/machineLearning/credit_default/algo/rfc/<solution_file>
/machineLearning/credit_default/algo/mlp/<solution_file>
/machineLearning/credit_default/algo/xgb/<solution_file>
/machineLearning/credit_default/algo/lgbm/<solution_file>
Ensembles:
/machineLearning/credit_default/ens/10-xgb/<solution_file>
/machineLearning/credit_default/ens/10-lgbm/<solution_file>
/machineLearning/credit_default/ens/avg/<solution_file>
Note
Please claim the issue first by commenting here before starting to work on it. Feel free to contact @amukh18 or @CinnamonRolls1 with any issues at any time.