UBC-MDS / data-analysis-review-2022

0 stars 1 forks source link

Submission: GROUP 20: Credit Card Default Predictor #4

Open mozhao0331 opened 1 year ago

mozhao0331 commented 1 year ago

Submitting authors: @mozhao0331 @kenuiuc @Althrun-sun @rkrishnan-arjun

Repository: https://github.com/UBC-MDS/credit_default_prediction_group_20 Report link: https://github.com/UBC-MDS/credit_default_prediction_group_20/blob/main/doc/credit_default_analysis_report.md Abstract/executive summary: For this project we are trying to answer the question:

Given a credit card customer's payment history and demographic information like gender, age, and education level, would the customer default on the next bill payment?"

Answering this question is important because, with an effective predictive model, financial institutions can evaluate a customer's credit level and grant appropriate credit amount limits. This analysis would be crucial in credit score calculation and risk management.

Editor: @flor14 Reviewer: Li Sam, Ganacheva Elena, Feng Yurui, Wijngaarden Renzo

elenagan commented 1 year ago

Data analysis review checklist

Reviewer: @elenagan

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 30 minutes

Review Comments:

  1. I liked how you considered a wide variety of models. You might want to explain why you chose these models in particularly in a little more detail.
  2. The authors are listed clearly in the README but it might be a good idea to include your affiliations.
  3. The introduction is clear and easy to understand, but there are some typos and unclear grammar in the more detailed sections about Data and Results & Discussion parts that may lead to some confusion.
  4. The code was organized clearly in functions with tests, but it might be useful to have some of the functions defined outside of the main function so they could be used outside of it.
  5. It might not be the best idea to score on all models explored. You might want to focus in on just the selected model you built.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Yurui-Feng commented 1 year ago

Data analysis review checklist

Reviewer: @Yurui-Feng

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5 hr

Review Comments:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Hongjian-Sam-Li commented 1 year ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1hr

Review Comments:

  1. Your EDA report is well organized. Briefed the variables and the project's purpose very clearly with the introduction and various plots.

  2. The scripts in the src directory include the whole process of the analysis from the data download, EDA, and model training to the model summary. All the results can be located easily in the results directory.

  3. The analysis pipeline is well organized, with various fitting methods, models, and validation scores. It will be better if you add the pros and cons (based on overfitting, CV scores, etc.) of each method you use and state why you used them by combining your research question with the characteristics of each model.

  4. In the Analysis Report, you explained clearly the main focus target of your analysis(lower the Type I and Type II errors). You provided the reason clearly for which scoring metric to use. It's suitable for the trade-off between precision and recall based on this specific real-life question.

  5. There is a little suggestion that you might consider trying some feature engineering and selection work in order to discover more potential features and combinatoins to improve the overall scores of your models.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

RenzoWijn commented 1 year ago

Data analysis review checklist

Reviewer: @Hawknum

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1hr

Review Comments:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

rkrishnan-arjun commented 1 year ago

Thank you for taking out time to review our project. Based on the feedback received, we've tried to improve the overall presentation of the report, and the workflow used to generate it.

Some of the key feedbacks and the commits that resolved them are:

  1. Feedback from Aditi(TA): Packages provided within the environment.yaml file arent present in conda. Try to find using conda search. Version numbers vary and some packages are just not found. Errors while trying to create the environment. Issue: https://github.com/UBC-MDS/credit_default_prediction_group_20/issues/39

Commits that fix the environment.yaml:

  1. Feedback from Peer Review: For the usage section, you might want to put each step's command into an individual code cell for easy copy-pasting in case one step didn't execute. Also, you might want to remove the square brackets for the optional argument (or you can include them without brackets).

Commit that removed the old code cells:

  1. Feedback 3 from Peer Review: Your report includes the author, but not the contributors (other people in the group) and both the readme and report are missing affiliations. Adding these would increase transparency.

Commits that add contributors and affiliations in both the readme and the final report:

  1. Feedback from Florencia: Activate GitHub pages for the report. Issue: https://github.ubc.ca/MDS-2022-23/DSCI_522_dsci-workflows_students/issues/5

Commits that specify a change in the final report:

  1. Feedback from Aditi(TA): Figure 1 is a little blurry upon rendering the RMD, and y axis label gets cut off for figure 4. Issue: https://github.com/UBC-MDS/credit_default_prediction_group_20/issues/39

Commits that fixed this: