UBC-MDS / data-analysis-review-2023

0 stars 0 forks source link

Submission: GROUP_8: Bank Marketing Analysis #4

Open Rachel0619 opened 11 months ago

Rachel0619 commented 11 months ago

Submitting authors: @Rachel0619 @rafecchang @AnuBanga @killerninja8 Sid Grover

Repository: https://github.com/UBC-MDS/dsci_522_group_8_bank_marketing_project Report link: https://ubc-mds.github.io/dsci_522_group_8_bank_marketing_project/bank_analysis.html Abstract/executive summary: Here we build a model of balanced SVC to try to predict if a new client will subscribe to a term deposit. We tested five different classification models, including dummy classifier, unbalanced/balanced logistic regression, and unbalanced/balanced SVC, and chose the optimal model of balanced SVC based on how the model scored on the test data; the model has the highest test recall score of 0.82, which indicates that the model makes the least false negative predictions among all five models.

The balanced support vector machines model considers 13 different numerical/ categorical features of customers. After hyperparameter optimization, the model’s test accuracy increased from 0.82 to 0.875. The results were somewhat expected, given SVC’s known efficacy in classification tasks, particularly when there’s a clear margin of separation. The high recall score of 0.875 indicates that the model is particularly adept at identifying clients likely to subscribe, which was the primary goal. It’s noteworthy that such a high recall was achieved, as it suggests the model is highly sensitive to true positive cases.

Editor: @Rachel0619 Reviewer: Angela Chen, Oak Chongfeungprinya, Iris Luo, Nicole Bidwell

angelachenmo commented 11 months ago

Data analysis review checklist

Reviewer: angelachenmo

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5hours

Review Comments:

Hello!

Group 8 members, the project overall looks very good to me, and I especially like the way you rendered the GitHub page which is nice and neat. Good job! If I am here to be extra critical, I would like to introduce some of my findings that might be helpful to improve your project as a whole. Please kindly read the below and let me know if you have any follow-ups:

  1. EDA: The EDA section looks nice and it effectively communicates the question to answer but if we want to show it more effectively I would recommend doing a correlation matrix to better illustrate the correlation relationship.

  2. REPORT: The final report is very descriptive, and fully covered the content and questions asked. I would suggest using more consistent and automatic tools like glue that we were taught in class.

  3. TEST: I also found that there is an error when I try to run pytest tests/* to test the functions.

  4. SCRIPT: The issue when running the Python command for python scripts/optimization.py ... is happening for me too.

Other than that It looks pretty decent to me, again, the topic is very valuable and the analysis is significant enough to answer the questions asked. Well done overall!

Cheers, Angela Chen

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

nicolebid commented 11 months ago

Data analysis review checklist

Reviewer: nicolebid

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5

Review Comments:

Hello! Your analysis was engaging to review. I found the report to be thorough and clear. It provided enough information to understand the analysis, along with clear justifications within your methodologies. In terms of your repo, I found it to have really good organization, especially the data directory. Most of your scripts also were detailed with clear documentation and useful comments. This was helpful for understanding what each script does. Overall, the README is well-structured with instructions that can be followed.

Here are my suggestions/areas of improvement. Note: some of the points are minuscule and more on the optional side, but I thought I'd include them incase you find them beneficial.

SUGGESTIONS/IMPROVEMENTS:

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

sivakornchong commented 11 months ago

Data analysis review checklist

Reviewer: @sivakornchong

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1.5 hours

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

The analysis has excellent explanation on the background, and the logical flow of the whole report is sound. I really appreciate the details that explains on model selection, and why recall is prioritized in this business context. In general, the report is well written. A small note on the EDA portion, for 'previous' parameter, there seems to be only one bar, and the reader might not be able to infer much from this chart.

On the model selection and optimization, the logic is reasonable. Based on the scores shown, I can follow and understand why svc_bal is the final model chosen. Not sure if I missed this somewhere, but it will be good for me to know the distribution of target variable in the training dataset.

Code is well written. A small suggestion would be to automate the code in model_selection to choose the best model by itself and returns as 'model_pipeline.pickle.' Otherwise, another idea is to automate and return a click.echo if SVC_bal turns out to not be the best in recall.

On software side, the docker can be downloaded as per intended in the instruction.

However, there is an error when I try to run pytest tests/* to test the functions. Probably, this is something that could be reviewed further.

image

And similar to Nicole's review above, there is an issue when running the python command for python scripts/optimization.py ... Probably, it will be good to check the whitespace and also use "\" to seperate the lines.

It has been a great read! Thank you!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

iris0614 commented 11 months ago

Data analysis review checklist

Reviewer: <@iris0614>

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 3 hours

Review Comments:

Dear Team, I wanted to take a moment to express my sincere appreciation for the outstanding work that Group 8 has done on the recent project. I must commend the team for the impeccable execution of the project.Great job, Group 8! Your hard work and collaboration have certainly paid off, and it's a pleasure to acknowledge your efforts. While the project is impressive, I'd like to offer some constructive feedback that might help elevate it even further. Please review the following points and feel free to discuss any questions or clarifications:

  1. EDA Section: The Exploratory Data Analysis (EDA) section effectively communicates the questions to be answered. To enhance its effectiveness, please consider incorporating a correlation matrix. This visual representation can provide a clearer illustration of the correlation relationships within the data.
  2. Report Enhancement: The final report is comprehensive and descriptive, covering all relevant content and questions. To streamline and automate certain aspects, consider using more consistent and automatic tools like glue, as taught in class. This can contribute to a more polished and standardized presentation.
  3. Test Functionality: I encountered an error when attempting to run pytest tests/*. It would be beneficial to investigate and address this issue to ensure the reliability of the testing process. Let's collaborate to identify and rectify the problem.
  4. Script Execution: Similar to the testing concerns, there appears to be an issue when running the Python command for python scripts/optimization.py. In summary, the project is exceptionally well-crafted, and the chosen topic demonstrates significant value. The conducted analysis is particularly noteworthy in effectively addressing the posed questions.

In conclusion, excellent work! I eagerly anticipate engaging in discussions and implementing the provided suggestions to further enhance the project's quality.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.