UBC-MDS / Group_7_Project

https://ubc-mds.github.io/Group_7_Project/bank_marketing_prediction.html
Other
0 stars 0 forks source link

Bank Marketing Prediction

A data analysis project for DSCI 522 (Data Science workflows); a course in the Master of Data Science program at the University of British Columbia.

About

In this project, we aimed to use customer information from a phone-call based direct marketing campaign of a Portuguese banking institution to predict whether customers would subscribe to the product offered, a term deposit. We applied several classification based models (k-NN, SVM, logistic regression and random forest) to our dataset to find the model which best fit our data, eventually settling on the random forest model, which performed the best among all the models tested, with an F-beta score with beta = 5 of 0.817, and an accuracy of 0.671 on the test data.

While this was the best performing model out of the models tested, its accuracy still left much to be desired. This indicates that perhaps more data is needed to accurately predict whether customers would subscribe to the term deposit. Future studies may also consider using more features, a different set of features which might be more relevant to whether customers will subscribe, or utilising feature engineering to obtain features which might be more useful in helping to predict whether customers would subscribe to the service.

Report

The final report for the project may be viewed at this link.

Data Description

In this project, we utilized a dataset concerning direct marketing campaigns conducted by a Portuguese banking institution, as provided by Sérgio Moro, P. Rita, and P. Cortez in 2012 (Moro, S., Rita, P., and Cortez, P.). The dataset was sourced from UC Irvine's Machine Learning Repository and can be accessed via the following link: https://archive.ics.uci.edu/dataset/222/bank+marketing. Comprising 16 features and 45,211 instances, each row of the dataset corresponds to information about an individual client of the Portuguese bank. The primary objective of the dataset creators was to predict whether a client would subscribe to a term deposit, a target variable indicated by the 'y' column. In our analysis, we also utilized this column as our target variable.

The columns of this data are defined as below: Variable Name Role Type Demographic Description Units Missing Values
age Feature Integer Age no
job Feature Categorical Occupation type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown') no
marital Feature Categorical Marital Status marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed) no
education Feature Categorical Education Level (categorical: 'basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown') no
default Feature Binary has credit in default? no
balance Feature Integer average yearly balance euros no
housing Feature Binary has housing loan? no
loan Feature Binary has personal loan? no
contact Feature Categorical contact communication type (categorical: 'cellular','telephone') yes
day_of_week Feature Date last contact day of the week no
month Feature Date last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec') no
duration Feature Integer last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model. no
campaign Feature Integer number of contacts performed during this campaign and for this client (numeric, includes last contact) no
pdays Feature Integer number of days that passed by after the client was last contacted from a previous campaign (numeric; -1 means client was not previously contacted) yes
previous Feature Integer number of contacts performed before this campaign and for this client no
poutcome Feature Categorical outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success') yes
y Target Binary has the client subscribed a term deposit? no

Usage

Note:

For this set up to run smoothly, you should have Docker installed and running on your computer. Download the appropriate Docker software for your machine here, then proceed with the instructions below.

Python, Git and Container Setup

  1. Setup your Python environment: e.g., Miniconda Python 3.11 [Guide]

  2. Clone the repository using this command in your terminal:

git clone https://github.com/UBC-MDS/Group_7_Project.git

Running the analysis

  1. Navigate to the root of this project on your computer using the command line and enter the following command to reset the project to a clean state (i.e., remove all files generated by previous runs of the analysis):
docker-compose run --rm main /bin/bash -c "cd work/ && make clean"
  1. To run the analysis in its entirety, enter the following command in the terminal in the project root:
docker-compose run --rm main /bin/bash -c "cd work/ && make all"
  1. Our final rendered report html file should be located at /docs/bank_marketing_prediction.html.

Alternative way to running the analysis

  1. Using the command line, change your directory into the root of this project's directory (using the cd command). Run the following command in that directory:
docker compose up
  1. You should see a URL following the pattern http://127.0.0.1:8888/lab?token= appear in the terminal window. For example, it might look like this: http://127.0.0.1:8888/lab?token=2d8c085f4e62c9270b1f39834d3fcbd63bc17cfc2a404fcb . Copy this from your terminal and paste the link into your preferred browser. This should open a Jupyter Lab browser containing the dependencies and documents used in our project. The relevant URL is highlighted in the screenshot below, for reference:

Docker Compose Up Screenshot (Example of URL)

  1. To run the analysis:

Note: Sometimes, you might face the error: bash: make: command not found when trying to run this. If this happens, do not panic! Instead, open a new terminal on your computer (not the Docker container) and navigate to the root of this project's directory and run docker pull riyashaju/group_7_project:latest. This should update your Docker image of our container to the latest version, which includes the make package. Try above steps again after doing this.

  1. Then, use the command make all to re-run our analyses and regenerate our report. Our final rendered report html file should be located at /docs/bank_marketing_prediction.html.

Cleaning Up the Container

  1. When you are done viewing the report and exploring our project, go back to the terminal window which is being used as the server to run the Docker container, and press the Control + C keys on your keyboard. This should stop the server and shut down all the kernels, and you should be able to type commands into the terminal again.

  2. Using the command line, type

docker compose down

This should remove and clean up the container.

Dependencies

Docker, a type of containerization software, was used to contain and run the dependencies required for this project. Detailed instructions on how to use the Docker image used in this project are found in the [Usage] section above. The Docker image used in this project was based on the quay.io/jupyter/minimal-notebook:2023-11-19 image.

The dependencies contained in the project's Docker image are, in no particular order:

Dependencies installed using conda:

Dependencies installed using pip:

More details on the exact dependencies contained in our Docker image can be found in our Dockerfile, located in the root of this project directory.

Running the tests

Tests are run using the pytest command in the root of the project.

License

The Bank Marketing Prediction materials here are licensed under the Creative Commons Zero v1.0 Universal (CC0 1.0 Universal). The code is licensed under MIT License. If re-using/re-mixing please provide attribution and link to this webpage.

References

If you find this project useful, please cite and refer to the original references: