UBC-MDS / data-analysis-review-2023

0 stars 0 forks source link

Submission: GROUP 21: Identifying the Top Three Predictors of Term Deposit Subscriptions #5

Open zgarciaj opened 12 months ago

zgarciaj commented 12 months ago

Submitting authors: @jy1909 @JohnShiuMK @zth96 @zgarciaj

Repository: https://github.com/UBC-MDS/group21_top-three-predictors-of-term-deposit-subscriptions Report link: https://ubc-mds.github.io/group21_top-three-predictors-of-term-deposit-subscriptions/term_deposit_report.html Abstract/executive summary: This report presents an analysis of the factors influencing client subscriptions to term deposits at a Portuguese banking institution. Utilizing a dataset comprising 45,211 client interactions with a target variable and 16 input features, we apply logistic regression and decision tree classifiers to identify the top three predictors of term deposit subscriptions. The data preprocessing involves handling missing values, encoding categorical variables, and standardizing numerical variables. Our exploratory data analysis leverages visualizations to understand feature distributions and correlations, while model evaluation focuses on precision and recall due to the dataset’s imbalance. Logistic regression is likely to prove slightly superior in precision to the decision tree classifier. The analysis identifies the outcome of previous campaigns, the month of contact, and the call duration as the most significant predictors. These findings offer valuable insights into the decision-making process of clients regarding term deposit subscriptions and suggest areas for future research.

Editor: @ttimbers Reviewer: Ben Chen, Waleed Mahmood, Aishwarya Nadimpally, Nasim Ghazanfari Nasrabadi

phchen5 commented 12 months ago

Data analysis review checklist

Reviewer: @phchen5 Ben Chen

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2

Review Comments:

Overall, excellent work! The file organization is well-structured, making navigation effortless. The codebase is both readable and concise, contributing to its clarity. Additionally, the report is polished, presenting information in a clean and articulate manner. Still, here are a few suggestions that might further elevate your project:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Aishwarya120111 commented 12 months ago

Data analysis review checklist

Reviewer: @Aishwarya120111

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1 hr

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

The report is impressively well-written and organized, which made it quite easy for me to understand and follow. There isn't much to improve, but a few minor adjustments could be made.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

WaleedMahmood1 commented 12 months ago

Data analysis review checklist

Reviewer: @WaleedMahmood1

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2.5 hrs

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Overall: Good job on the project! It was a great read and it is very interesting the way that insights can be drawn from predictions on subscriptions to term deposits. The project repository is very well structured and it was easy to navigate to find what I was looking for. In addition to these, there is an abundance of information which gives me enough background to be able to understand the purpose of the methods being implemented for the reasons that they are.

Constructive Feedback:

  1. I believe that matplotlib is not being used in any of the scripts or analysis code as you are using altair. It might be better to remove it so that people replicating your analysis are not installing any libraries that might not be used.
  2. There are .html and .ipynb files placed in the src folder and in the report folder. Referencing Tiffany’s example repository, having duplicates placed in the src folder is not necessary. Perhaps you might have been running the code there earlier and might have missed removing them. Just highlighting this so that your files are not repeated, and the file placement in your project repository is perfect to the dot.
  3. There is a lot of technical terminology being used in the report. A suggestion is to explain all of the technical terms being used, or perhaps minimize the use of technical terms in the final report.
  4. In the notebook src/term_deposit_report.ipynb when I select “Restart Kernel and Run All Cells...” from the “Kernel” menu; however, the second code block where the data is being loaded up, there is an error in running the command, saying No such file or directory: '../data/bank-full.csv'. I believe this can be resolved by changing the path in the code to ../data/raw/bank-full.csv.

Once again, these are quite minor issues. Great job on the project!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

nassimgha commented 12 months ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

Review Comments:

The analysis provided is generally well-structured and informative. The use of logistic regression and decision tree classifier for model development allows for a comparison between linear and non-linear modeling techniques. Model Evaluation: The focus on accuracy, precision, and recall metrics, with a particular emphasis on precision, aligns with the study's objective to minimize Type 1 errors. However, here are a few points where improvements or clarifications could be made:

Variable Transformation Description and Metrics Description: When describing the preprocessing phase, it would be helpful to provide more context or reasoning behind the choice of transformations. Also, when describing scoring metrics, it would help if you provide more explanations on their definition and differences for those who are not familiar with these terms.

Limitations Section: While the limitations section is comprehensive, it might be beneficial to provide potential solutions or considerations for addressing some of the identified limitations.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.