WebClub-NITK / Hacktoberfest-2k19

19 stars 115 forks source link

Machine learning algorithms and data visualisation for credit cards default prediction #284

Closed amukh18 closed 4 years ago

amukh18 commented 4 years ago

Description

Your task is to predict the probability that a credit card owner will default based on his/her characteristics and payment history. This is a classification problem.

The dataset can be found here.

The features convey the following information: X1: Amount of the given credit (New Taiwan dollar): it includes both the individual consumer credit and his/her family (supplementary) credit. X2: Gender (1 = male; 2 = female). X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). X4: Marital status (1 = married; 2 = single; 3 = others). X5: Age (year). X6 - X11: History of past payment (from April to September, 2005) Where X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; ... X11 = the repayment status in April, 2005.

And the measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; ... 8 = payment delay for eight months; 9 = payment delay for nine months and above.

X12-X17: Amount of bill statement (New Taiwan dollar). Where X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; ... X17 = amount of bill statement in April, 2005.

X18-X23: Amount of previous payment (New Taiwan dollar). Where X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . . . . . X23 = amount paid in April, 2005.


Details


Issue requirements / progress

All algorithms and ensembles must be scores using RMSE, Logloss and Accuracy metrics. Each pull request must only fulfill one of the tasks below.

Plots:

Algorithms:

Cross-validations/Ensembles:


Resources

List of resources that might be required / helpful. Here are a few resources that may help you:

  1. NumPy documentation: https://docs.scipy.org/doc/numpy-1.13.0/reference/index.html
  2. Scikit-learn documentation: https://scikit-learn.org/stable/documentation.html
  3. Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/
  4. Jupyter Notebook installation and tutorial : https://www.dataquest.io/blog/jupyter-notebook-tutorial/
  5. XGBoost documentation: https://xgboost.readthedocs.io/en/latest/
  6. LightGBM documentation: https://lightgbm.readthedocs.io/en/latest/
  7. Scikit-learn documentation
  8. Seaborn documentation a. Scatter-plot: https://seaborn.pydata.org/generated/seaborn.scatterplot.html b. Heat-map: http://seaborn.pydata.org/generated/seaborn.heatmap.html c. lmplot: https://seaborn.pydata.org/generated/seaborn.lmplot.html

Directory Structure

The following convention must be adhered to when placing your solution files.

Plots:

Algorithms:

Ensembles:

Note

Please claim the issue first by commenting here before starting to work on it. Feel free to contact @amukh18 or @CinnamonRolls1 with any issues at any time.