WebClub-NITK / Hacktoberfest-2k19

19 stars 115 forks source link

Machine learning algorithms and data visualisation for credit cards default prediction #284

Closed amukh18 closed 4 years ago

amukh18 commented 4 years ago


Your task is to predict the probability that a credit card owner will default based on his/her characteristics and payment history. This is a classification problem.

The dataset can be found here.

The features convey the following information: X1: Amount of the given credit (New Taiwan dollar): it includes both the individual consumer credit and his/her family (supplementary) credit. X2: Gender (1 = male; 2 = female). X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). X4: Marital status (1 = married; 2 = single; 3 = others). X5: Age (year). X6 - X11: History of past payment (from April to September, 2005) Where X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; ... X11 = the repayment status in April, 2005.

And the measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; ... 8 = payment delay for eight months; 9 = payment delay for nine months and above.

X12-X17: Amount of bill statement (New Taiwan dollar). Where X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; ... X17 = amount of bill statement in April, 2005.

X18-X23: Amount of previous payment (New Taiwan dollar). Where X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . . . . . X23 = amount paid in April, 2005.


Issue requirements / progress

All algorithms and ensembles must be scores using RMSE, Logloss and Accuracy metrics. Each pull request must only fulfill one of the tasks below.





List of resources that might be required / helpful. Here are a few resources that may help you:

  1. NumPy documentation: https://docs.scipy.org/doc/numpy-1.13.0/reference/index.html
  2. Scikit-learn documentation: https://scikit-learn.org/stable/documentation.html
  3. Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/
  4. Jupyter Notebook installation and tutorial : https://www.dataquest.io/blog/jupyter-notebook-tutorial/
  5. XGBoost documentation: https://xgboost.readthedocs.io/en/latest/
  6. LightGBM documentation: https://lightgbm.readthedocs.io/en/latest/
  7. Scikit-learn documentation
  8. Seaborn documentation a. Scatter-plot: https://seaborn.pydata.org/generated/seaborn.scatterplot.html b. Heat-map: http://seaborn.pydata.org/generated/seaborn.heatmap.html c. lmplot: https://seaborn.pydata.org/generated/seaborn.lmplot.html

Directory Structure

The following convention must be adhered to when placing your solution files.





Please claim the issue first by commenting here before starting to work on it. Feel free to contact @amukh18 or @CinnamonRolls1 with any issues at any time.