Open zgarciaj opened 12 months ago
Overall, excellent work! The file organization is well-structured, making navigation effortless. The codebase is both readable and concise, contributing to its clarity. Additionally, the report is polished, presenting information in a clean and articulate manner. Still, here are a few suggestions that might further elevate your project:
Regarding the two .ipynb
notebooks and their corresponding .html
files, clarifying the distinction between them in the README.md
could enhance clarity. It might be beneficial to explain their purposes or which one represents the final deliverable. Additionally, reconsider rendering term_deposit_full_analysis.ipynb
into an .html
file if its contents are already encompassed in the other HTML file. Moreover, there seems to be an error within the term_deposit_full_analysis.html
file under docs/
that might need attention.
The dataset names like bank-additional-full.csv
and bank-additional.csv
could benefit from a documentation outlining their content or distinctions. This would offer clarity and help users understand the differences between these datasets.
In the report, delving a bit deeper into the rationale behind choosing logistic regression and decision trees—whether it's due to their interpretability, simplicity, or other factors—would provide valuable insight.
Discussing the limitations of using Logistic Regression to analyze feature importance in the report would be beneficial. Factors like its assumptions of linearity and feature independence might be worth mentioning.
Improving the visualization of the Job Type
bar graph by sorting the bars would enhance its aesthetic appeal and make it more intuitive for readers.
Renaming Unnamed: 0
to a more descriptive name and excluding it from the heatmap might prevent it from overshadowing other correlations, particularly with pdays
and previous
.
Lastly, double-checking the references, particularly the first one, for missing DOI information would ensure consistency and completeness in the reference section.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
The report is impressively well-written and organized, which made it quite easy for me to understand and follow. There isn't much to improve, but a few minor adjustments could be made.
.ipynb
and .html
. I think having final report in a single folder will suffice. If your use-case require multiple notebooks, it would be more clear if you mention about the files..html
report, the table formatting will help to add more beauty to your page. The Unnamed : 0
column can be ignored or renamed to a meaningful name.This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Overall: Good job on the project! It was a great read and it is very interesting the way that insights can be drawn from predictions on subscriptions to term deposits. The project repository is very well structured and it was easy to navigate to find what I was looking for. In addition to these, there is an abundance of information which gives me enough background to be able to understand the purpose of the methods being implemented for the reasons that they are.
Constructive Feedback:
matplotlib
is not being used in any of the scripts or analysis code as you are using altair. It might be better to remove it so that people replicating your analysis are not installing any libraries that might not be used..html
and .ipynb
files placed in the src
folder and in the report
folder. Referencing Tiffany’s example repository, having duplicates placed in the src
folder is not necessary. Perhaps you might have been running the code there earlier and might have missed removing them. Just highlighting this so that your files are not repeated, and the file placement in your project repository is perfect to the dot.src/term_deposit_report.ipynb
when I select “Restart Kernel and Run All Cells...” from the “Kernel” menu; however, the second code block where the data is being loaded up, there is an error in running the command, saying No such file or directory: '../data/bank-full.csv'
. I believe this can be resolved by changing the path in the code to ../data/raw/bank-full.csv
.Once again, these are quite minor issues. Great job on the project!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
The analysis provided is generally well-structured and informative. The use of logistic regression and decision tree classifier for model development allows for a comparison between linear and non-linear modeling techniques. Model Evaluation: The focus on accuracy, precision, and recall metrics, with a particular emphasis on precision, aligns with the study's objective to minimize Type 1 errors. However, here are a few points where improvements or clarifications could be made:
Variable Transformation Description and Metrics Description: When describing the preprocessing phase, it would be helpful to provide more context or reasoning behind the choice of transformations. Also, when describing scoring metrics, it would help if you provide more explanations on their definition and differences for those who are not familiar with these terms.
Limitations Section: While the limitations section is comprehensive, it might be beneficial to provide potential solutions or considerations for addressing some of the identified limitations.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: @jy1909 @JohnShiuMK @zth96 @zgarciaj
Repository: https://github.com/UBC-MDS/group21_top-three-predictors-of-term-deposit-subscriptions Report link: https://ubc-mds.github.io/group21_top-three-predictors-of-term-deposit-subscriptions/term_deposit_report.html Abstract/executive summary: This report presents an analysis of the factors influencing client subscriptions to term deposits at a Portuguese banking institution. Utilizing a dataset comprising 45,211 client interactions with a target variable and 16 input features, we apply logistic regression and decision tree classifiers to identify the top three predictors of term deposit subscriptions. The data preprocessing involves handling missing values, encoding categorical variables, and standardizing numerical variables. Our exploratory data analysis leverages visualizations to understand feature distributions and correlations, while model evaluation focuses on precision and recall due to the dataset’s imbalance. Logistic regression is likely to prove slightly superior in precision to the decision tree classifier. The analysis identifies the outcome of previous campaigns, the month of contact, and the call duration as the most significant predictors. These findings offer valuable insights into the decision-making process of clients regarding term deposit subscriptions and suggest areas for future research.
Editor: @ttimbers Reviewer: Ben Chen, Waleed Mahmood, Aishwarya Nadimpally, Nasim Ghazanfari Nasrabadi