Open nassimgha opened 12 months ago
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Minor Issues with Scripts
# Split data (train-test), process data and save preprocessorsplit
python scripts/split_and_process.py \
--raw_data='data/raw/bank-full.csv' \
--save_to='data/processed' \
--preprocessor_to='results/models' \
--seed=522
/opt/conda/lib/python3.11/site-packages/sklearn/neighbors/_classification.py:233: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
return self._fit(X, y)
feat_imp.py
had inconsistent input names - the README.md and documentation still use resampled_training_data
as the input, while the actual script uses the name transformed_training_data
, which caused some error in reproducing the analysis. This should be corrected.docker compose exec jupyter-lab /bin/bash
did not run for me, with the error bash: docker: command not found
. However, the command jupyter-book build report
still worked. I would suggest removing the docker compose command in the light of this.This was derived from the JOSE review checklist and the ROpenSci review checklist.
Overall, I think you did a great job with the report and I enjoyed reading your analysis. In the following some suggestions how you could improve your report even further:
In summary, you have shown that you have conducted thorough research and discovered interesting findings.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Great work! Your report is nicely formatted and professional (I like the color scheme you chose for the plots :D ), and your analysis methods look great. I was able to reproduce your analysis with minimal errors.
Small issues I bumped into/suggestions for improvement:
Error: No such option: --resampled_training_data Did you mean --transformed_training_data? bash: --seed=522: command not found
on feat_imp.pyThis was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: <zhang-shizhe> <Marcony1> <celestezhao> <nassimgha>
Repository: Report link: Abstract/executive summary: In this analysis, we attempt to build a predictive model aimed at determining whether a client will subscribe to a term deposit, utilizing the data associated with direct marketing campaigns, specifically phone calls, in a Portuguese banking institution.
After exploring on several models (logistic regression, KNN, decision tree, naive Bayers), we have selected the logistic regression model as our primary predictive tool. The final model performs fairly well when tested on an unseen dataset, achieving the highest AUC (Area Under the Curve) of 0.899. This exceptional AUC score underscores the model's capacity to effectively differentiate between positive and negative outcomes. Notably, certain factors such as last contact duration, last contact month of the year and the clients' types of jobs play a significant role in influencing the classification decision.
The dataset used in this project originates from the Bank Marketing dataset created by S. Moro, P. Rita and P. Cortez at Iscte - University Institute of Lisbon. This dataset is accessible through the UCI Machine Learning Repository and can be accessed here. Among the four available datasets, we have utilized bank-full.csv which contains all examples and 17 inputs. Each row in the dataset represents an individual client data including the personal details (e.g., age, occupation, loan status, etc.), information regarding their response to the marketing campaign (e.g., outcomes of the previous marketing campaign, number of contacts made during the current campaign, etc.), and the eventual subscription outcome for the term deposit.
Editor: @ttimbers Reviewer: Beth Ou Yang, Michelle Hunn, Gretel Tan, Julia Everitt