Open liannah opened 2 years ago
Reviewer: @jessie14
Conflict of interest
Code of Conduct
General checks
Documentation
Code quality
Reproducibility
Analysis report
Estimated hours spent reviewing: 1.5 hours
Review Comments:
Attribution This was derived from the JOSE review checklist and the ROpenSci review checklist.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
1.5 hours
This was derived from the JOSE review checklist and the ROpenSci review checklist.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Thank you all @mahm00d27 @jessie14 @ming0701 @Kendy-Tan for your feedback . We @thayeylolu @karanpreetkaur @liannah and @Arushi282 have made some of the proposed changes from your suggestions
[x] We have changed the file names for better description of our work. This can be found here
[x] We restructured our report to better highlight our findings. This can be found here
[x] We defined the context of default payment.This can be found here and here
[x] We added our names as the copyright holders and changed software to project in the license file. This can be found here
[x] We explained the final model’s performance better. This can be found here and [here]()
[x] We discussed the limitation of random search hyper parameterisation. This can be found here
Submitting authors: @liannah @Arushi282 @thayeylolu @karanpreetkaur
Repository: https://github.com/UBC-MDS/credit_default_prediction Report link: https://htmlpreview.github.io/?https://github.com/UBC-MDS/credit_default_prediction/blob/main/doc/credit_default_prediction_report.html Abstract/executive summary:
In this project, we built a classification model using Logistic Regression to predict if credit account holders will make a default payment next month. The model was trained on features that hold information about the client’s last 6 months bill and payment history, as well as several other characteristics such as: age, marital status, education, and gender. Overall, we are more interested in minimizing Type | error (predicting no default payment, when in reality the client made a default payment the following month), as opposed to Type || error (predicting default payment, when in reality no default payment was made by the client), we are using f1 as our primary scoring metric. Our model performed fairly well on test data set with the f1 score being ~0.53. Our recall and precision rate are moderately high, being ~0.48, ~0.59 respectively. The given scores are consistent with the train data set scores, thus we can say that the model is generalizable on unseen data. However, the scores are not high, and our model is error prompt. The model can correctly classify default payments roughly half of the time. The value of incorrectly identifying default or no default can cause a lot of money and reputation to the company, thus we recommend continuing study to improve this prediction model before it is put into production in the credit companies. Some of the improvement research topics can be feature engineering, bigger dataset collected from other countries (China, Canada, Japan).
The data set used in the project is created by Yeh, I. C., and Lien, C. H (Yeh and Lien 2009), and made publicly available for download in UCI Machine Learning Repository (“default of credit card clients” 2016). The data can be found here. The dataset is based on Taiwan’s credit card client default cases from April to September. It has 30000 examples, and each example represents particular client’s information. The dataset has 24 observations with respective values such as gender, age, marital status, last 6 months bills, last 6 months payments, etc, including the final default payment of next month column: labeled 1 (client will make a default) and 0 (client will not make a default).
Editor: @flor14 Reviewer: @Mahm00d27 @jessie14 @ming0701 @Kendy-Tan