Open Rachel0619 opened 11 months ago
Hello!
Group 8 members, the project overall looks very good to me, and I especially like the way you rendered the GitHub page which is nice and neat. Good job! If I am here to be extra critical, I would like to introduce some of my findings that might be helpful to improve your project as a whole. Please kindly read the below and let me know if you have any follow-ups:
EDA: The EDA section looks nice and it effectively communicates the question to answer but if we want to show it more effectively I would recommend doing a correlation matrix to better illustrate the correlation relationship.
REPORT: The final report is very descriptive, and fully covered the content and questions asked. I would suggest using more consistent and automatic tools like glue that we were taught in class.
TEST: I also found that there is an error when I try to run pytest tests/*
to test the functions.
SCRIPT: The issue when running the Python command for python scripts/optimization.py ...
is happening for me too.
Other than that It looks pretty decent to me, again, the topic is very valuable and the analysis is significant enough to answer the questions asked. Well done overall!
Cheers, Angela Chen
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Hello! Your analysis was engaging to review. I found the report to be thorough and clear. It provided enough information to understand the analysis, along with clear justifications within your methodologies. In terms of your repo, I found it to have really good organization, especially the data directory. Most of your scripts also were detailed with clear documentation and useful comments. This was helpful for understanding what each script does. Overall, the README is well-structured with instructions that can be followed.
Here are my suggestions/areas of improvement. Note: some of the points are minuscule and more on the optional side, but I thought I'd include them incase you find them beneficial.
SUGGESTIONS/IMPROVEMENTS:
optimization.py
. A bit unclear if that was intended, I'm not sure if both are being used.)docker pull ...
). cd work
for clarity of where the root of the directory is (when using container)# Optimization and Accuracy/Recall Scores
) was throwing an error. I tried it both in the container and in the virtual environment and both gave errors. Perhaps, see if the other reviewers were able to get this working, in case it's on my end. This was derived from the JOSE review checklist and the ROpenSci review checklist.
[ ] Installation instructions: Is there a clearly stated list of dependencies? Note: If the installation guide is to use Docker, dependencies should include Docker and the relevant packages in the Dockerfile instead of pointing towards environment.yml
[x] Example usage: Do the authors include examples of how to use the software to reproduce the data analysis?
[x] Functionality documentation: Is the core functionality of the data analysis software documented to a satisfactory level?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support Note: there is a document "Contributing.md." It is clear that if a third party want to contribute, one can create a pull request and requireing two existing team members to merge. A suggestion would be to list down the names of original team members in this document too.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
The analysis has excellent explanation on the background, and the logical flow of the whole report is sound. I really appreciate the details that explains on model selection, and why recall is prioritized in this business context. In general, the report is well written. A small note on the EDA portion, for 'previous' parameter, there seems to be only one bar, and the reader might not be able to infer much from this chart.
On the model selection and optimization, the logic is reasonable. Based on the scores shown, I can follow and understand why svc_bal is the final model chosen. Not sure if I missed this somewhere, but it will be good for me to know the distribution of target variable in the training dataset.
Code is well written. A small suggestion would be to automate the code in model_selection to choose the best model by itself and returns as 'model_pipeline.pickle.' Otherwise, another idea is to automate and return a click.echo if SVC_bal turns out to not be the best in recall.
On software side, the docker can be downloaded as per intended in the instruction.
However, there is an error when I try to run pytest tests/*
to test the functions. Probably, this is something that could be reviewed further.
And similar to Nicole's review above, there is an issue when running the python command for python scripts/optimization.py ...
Probably, it will be good to check the whitespace and also use "\" to seperate the lines.
It has been a great read! Thank you!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Dear Team, I wanted to take a moment to express my sincere appreciation for the outstanding work that Group 8 has done on the recent project. I must commend the team for the impeccable execution of the project.Great job, Group 8! Your hard work and collaboration have certainly paid off, and it's a pleasure to acknowledge your efforts. While the project is impressive, I'd like to offer some constructive feedback that might help elevate it even further. Please review the following points and feel free to discuss any questions or clarifications:
In conclusion, excellent work! I eagerly anticipate engaging in discussions and implementing the provided suggestions to further enhance the project's quality.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: @Rachel0619 @rafecchang @AnuBanga @killerninja8 Sid Grover
Repository: https://github.com/UBC-MDS/dsci_522_group_8_bank_marketing_project Report link: https://ubc-mds.github.io/dsci_522_group_8_bank_marketing_project/bank_analysis.html Abstract/executive summary: Here we build a model of balanced SVC to try to predict if a new client will subscribe to a term deposit. We tested five different classification models, including dummy classifier, unbalanced/balanced logistic regression, and unbalanced/balanced SVC, and chose the optimal model of balanced SVC based on how the model scored on the test data; the model has the highest test recall score of 0.82, which indicates that the model makes the least false negative predictions among all five models.
The balanced support vector machines model considers 13 different numerical/ categorical features of customers. After hyperparameter optimization, the model’s test accuracy increased from 0.82 to 0.875. The results were somewhat expected, given SVC’s known efficacy in classification tasks, particularly when there’s a clear margin of separation. The high recall score of 0.875 indicates that the model is particularly adept at identifying clients likely to subscribe, which was the primary goal. It’s noteworthy that such a high recall was achieved, as it suggests the model is highly sensitive to true positive cases.
Editor: @Rachel0619 Reviewer: Angela Chen, Oak Chongfeungprinya, Iris Luo, Nicole Bidwell