In this code pattern, we will create and deploy a customer churn prediction model using IBM Cloud Pak for Data. The basis for our model is a data set that contains customer demographics and trading activity data. We will use a Jupyter notebook to visualize the data, build hypotheses for prediction, and then build, test, and save a prediction model. Finally, we will enable a web service and use the model from an app.
This code pattern has been updated to include images from the latest version of Cloud Pak for Data, v3.5.0.
The use case describes a stock trader company that can use churn prediction to target offers for at-risk customers. Once deployed, the model can be used for inference from an application using the REST API. A simple app is provided to demonstrate using the model from a Python app.
When the reader has completed this code pattern, they will understand how to:
The instructions in this code pattern assume you are using IBM Cloud Pak for Data version 3.5.0.
Sign in to your IBM Cloud Pak for Data web client. All of the steps are performed using the web client unless stated otherwise.
Clone the icp4d-customer-churn-classifier
repo locally. In a terminal, run the following command:
git clone https://github.com/IBM/icp4d-customer-churn-classifier
To get started, open the Projects
page and set up an analytics project to hold the assets that you want to work with, and then get data for your project.
Launch a browser and navigate to your Cloud Pak for Data deployment.
Go to the (☰) menu, expand Projects
and click All projects
:
Analytics project
and click Next
:Create an empty project
:Create
:NOTE: You can optionally load the data into Db2 Warehouse. For instructions, go to use Db2 Warehouse to store customer data.
Assets
tab, click the 01/00
icon and the Load
tab, then either drag the data/mergedcustomers.csv file from the cloned repository to the window or navigate to it using browse for files to upload
:Before we create a machine learning model, we will have to set up a deployment space where we can save and deploy the model.
Follow the steps in this section to create a new deployment space.
Deployments
:New deployment space +
:Create
.View new space
.Assets
tab, either click the Add to project +
button, and choose Notebook
, or, if the Notebooks section exists, to the right of Notebooks click New notebook +
:Python 3.7
environment as the Runtime and then either drag the notebooks/TradingCustomerChurnClassifierSparkML.ipynb file from the cloned repository to the window or navigate to it using Drag and drop files here or upload.
. Click Create
:Important: Make sure that you stop the kernel of your notebook(s) when you are done, in order to conserve memory resources!
Now that you are in the notebook, add generated code to insert the data as a DataFrame and fix-up the notebook reference to the DataFrame.
Place your cursor at the last line of the following cell:
# Use the find data 01/00 icon and under your remote data set
# use "Insert to code" and "Insert pandas DataFrame
# here.
# Add asset from file system
Click the find data 01/00
icon on the menu bar (last icon). On the Files tab, find the data set that you added to the project, click Insert to code
and pandas DataFrame
.
The inserted code will result in a DataFrame assigned to a variable named df1
or df_data_1
(perhaps with a different sequence number). Find the code cell like the following code block and edit the #
to make it match the variable name.
# After inserting the pandas DataFrame code above, change the following
# df_data_# to match the variable used in the above code. df_churn_pd is used
# later in the notebook.
df_churn_pd = df_data_#
The Watson Machine Learning client is required to save and deploy our customer churn predictive model, and should be available on your IBM Cloud Pak for Data platform. Find the cell containing the code given below and insert the url, username and password for your IBM Cloud Pak for Data instance:
from ibm_watson_machine_learning import APIClient
# get URL, username and password from your IBM Cloud Pak for Data administrator
wml_credentials = {
"url": "https://X.X.X.X",
"username": "*****",
"password": "*****",
"instance_id": "wml_local",
"version" : "3.5"
}
client = APIClient(wml_credentials)
print(client.version)
IBM Cloud Pak for Data uses the concept of deployment spaces, which is where models can be deployed. You can list all the spaces using the .list() function.
Provide the name of the deployment space that you created in Step 3 above in the cell containing the following text.
#Insert the name of your deployment space here:
DEPLOYMENT_SPACE_NAME = 'INSERT-YOUR-DEPLOYMENT-SPACE-NAME-HERE'
The next cell, then, looks up the deployment space id based on the name that you have provided and prints it out. If you do not receive a space ID as an output to the next cell, verify that you have created a deployment space and have provided the correct deployment space name. Do not proceed until this next cell runs successfully and returns the space_id.
Once you know the deployment space id, update the next cell with this id to set this deployment space as the default deployment space. Further down the notebook, when you deploy the model, it will be deployed to this default deployment space.
# Now set the default space to the GUID for your deployment space. If this is successful, you will see a 'SUCCESS' message.
client.set.default_space('INSERT_SPACE_ID_HERE')
Cell ▷ Run All
or run the cells individually with the play button as shown here.During the execution of a cell in a Jupyter notebook, an asterisk [*]
displays in the square bracket which changes to a sequence number when execution of that cell completes.
The mix of documentation, code, and output can make a Jupyter output self-explanatory. This also makes it a great environment to "show your work" if you have a hypothesis, do some analysis, and come up with a conclusion.
See the notebook with example output here.
IBM Cloud Pak for Data provides various options for analytics models such as testing, scoring, evaluating, and publishing.
We can start testing using the built-in tooling.
Deployments
:Spaces
tab, choose the deployment space you setup previously by clicking on the name of the space.NOTE: There may be more than one model listed in the 'Models' section. This can happen if you have run the Jupyter notebook more than once. Although you could select any of the models you see listed in the page, the recommendation is to start with whichever model is available that is using a spark-mllib_2.4 runtime.
NOTE: Testing using JSON is enabled for this model because we have specified the input fields for the model during model creation in the notebook as shown below:
To test the model by providing data using the form, click on the Provide input using form
icon and enter the following input in the form fields:
ID
: 4GENDER
: FSTATUS
: MCHILDREN
: 2ESTINCOME
: 52004HOMEOWNER
: NAGE
: 25TOTALDOLLARVALUETRADED
: 5030TOTALUNITSTRADED
: 23,LARGESTSINGLETRANSACTION
: 1257SMALLESTSINGLETRANSACTION
: 125PERCENTCHANGECALCULATION
: 3DAYSSINCELASTLOGIN
: 2DAYSSINCELASTTRADE
: 19NETREALIZEDGAINS_YTD
: 0NETREALIZEDLOSSES_YTD
: 251Click the Predict
button and the model will be called with the input data. The results will display in the Result window. Scroll down to the bottom (Line #110) to see either a "High", a "Low" or a "Medium" for Churn:
Provide input data as JSON
icon and paste the following data under Body:{
"input_data":[
{
"fields":[
"ID",
"GENDER",
"STATUS",
"CHILDREN",
"ESTINCOME",
"HOMEOWNER",
"AGE",
"TOTALDOLLARVALUETRADED",
"TOTALUNITSTRADED",
"LARGESTSINGLETRANSACTION",
"SMALLESTSINGLETRANSACTION",
"PERCENTCHANGECALCULATION",
"DAYSSINCELASTLOGIN",
"DAYSSINCELASTTRADE",
"NETREALIZEDGAINS_YTD",
"NETREALIZEDLOSSES_YTD"
],
"values":[
[
4,
"F",
"M",
2,
52004,
"N",
60,
5030,
23,
1257,
125,
3,
1,
1,
1000,
0
]
]
}
]
}
Predict
button and the model will be called with the input data. The results will display in the Result window. Scroll down to the bottom (Line #110) to see either a "High", a "Low" or a "Medium" for Churn:Now that the model is deployed, we can also test it from external applications. One way to invoke the model API is using the cURL command.
NOTE: Windows users will need the cURL command. It is recommended to download gitbash for this, as you will also have other tools and you will be able to easily use the shell environment variables in the following steps. Also note that if you are not using gitbash, you may need to change export commands to set commands.
URL
, username
and password
:curl -k -X GET https://<cluster-url>/v1/preauth/validateAuth -u <username>:<password>
{"username":"snyk","role":"Admin","permissions":["access_catalog","administrator","manage_catalog","can_provision"],"sub":"snyk","iss":"KNOXSSO","aud":"DSX","uid":"1000331002","authenticator":"default","accessToken":"eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InNueWstYWRtaW4iLCJyb2xlIjoiQWRtaW4iLCJwZXJtaXNzaW9ucyI6WyJhZG1pbmlzdHJhdG9yIiwiY2FuX3Byb3Zpc2lvbiIsIm1hbmFnZV9jYXRhbG9nIiwibWFuYWdlX3F1YWxpdHkiLCJtYW5hZ2VfaW5mb3JtYXRpb25fYXNzZXRzIiwibWFuYWdlX2Rpc2NvdmVyeSIsIm1hbmFnZV9tZXRhZGF0YV9pbXBvcnQiLCJtYW5hZ2VfZ292ZXJuYW5jZV93b3JrZmxvdyIsIm1hbmFnZV9jYXRlZ29yaWVzIiwiYXV0aG9yX2dvdmVycmFuY2VfYXJ0aWZhY3RzIiwiYWNjZXNzX2NhdGFsb2ciLCJhY2Nlc3NfaW5mb3JtYXRpb25fYXNzZXRzIiwidmlld19xdWFsaXR5Iiwic2lnbl9pbl9vbmx5Il0sInN1YiI6InNueWstYWRtaW4iLCJpc3MiOiJLTk9YU1NPIiwiYXVkIjoiRFNYIiwidWlkIjoiMTAwMDMzMTAwMiIsImF1dGhlbnRpY2F0b3IiOiJkZWZhdWx0IiwiaWp0IjoxNTkyOTI3MjcxLCJleHAiOjE1OTI5NzA0MzV9.MExzML-45SAWhrAK6FQG5gKAYAseqdCpublw3-OpB5OsdKJ7whrqXonRpHE7N7afiwU0XNrylbWZYc8CXDP5oiTLF79zVX3LAWlgsf7_E2gwTQYGedTpmPOJgtk6YBSYIB7kHHMYSflfNSRzpF05JdRIacz7LNofsXAd94Xv9n1T-Rxio2TVQ4d91viN9kTZPTKGOluLYsRyMEtdN28yjn_cvjH_vg86IYUwVeQOSdI97GHLwmrGypT4WuiytXRoQiiNc-asFp4h1JwEYkU97ailr1unH8NAKZtwZ7-yy1BPDOLeaR5Sq6mYNIICyXHsnB_sAxRIL3lbBN87De4zAg","_messageCode_":"success","message":"success"}
WML_AUTH_TOKEN
. export WML_AUTH_TOKEN=<value-of-access-token>
URL
to invoke the model from the API reference by copying the Endpoint
, and export it to a variable called URL
:export URL=https://blahblahblah.com
Now run this curl command from a terminal window to invoke the model with the same payload that was used previously:
curl -k -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' --header "Authorization: Bearer $WML_AUTH_TOKEN" -d '{"input_data":[{"fields":["ID", "GENDER", "STATUS", "CHILDREN", "ESTINCOME", "HOMEOWNER", "AGE", "TOTALDOLLARVALUETRADED", "TOTALUNITSTRADED", "LARGESTSINGLETRANSACTION", "SMALLESTSINGLETRANSACTION", "PERCENTCHANGECALCULATION", "DAYSSINCELASTLOGIN", "DAYSSINCELASTTRADE", "NETREALIZEDGAINS_YTD", "NETREALIZEDLOSSES_YTD"],"values":[[4, "F", "M", 2, 52004, "N", 60, 5030, 23, 1257, 125, 3, 1, 1, 1000, 0]]}]}' $URL
A json string similar to the one below will be returned with the response, including a "High", a "Low" or a "Medium" at the end indicating the risk of churn for this customer.
{"predictions":[{"fields":["ID","GENDER","STATUS","CHILDREN","ESTINCOME","HOMEOWNER","AGE","TOTALDOLLARVALUETRADED","TOTALUNITSTRADED","LARGESTSINGLETRANSACTION","SMALLESTSINGLETRANSACTION","PERCENTCHANGECALCULATION","DAYSSINCELASTLOGIN","DAYSSINCELASTTRADE","NETREALIZEDGAINS_YTD","NETREALIZEDLOSSES_YTD","GENDERIndex","GENDERclassVec","STATUSIndex","STATUSclassVec","HOMEOWNERIndex","HOMEOWNERclassVec","features","rawPrediction","probability","prediction","predictedLabel"],"values":[[4,"F","M",2,52004,"N",60,5030,23,1257,125,3,1,1,1000,0,0.0,[1,[0],[1.0]],0.0,[2,[0],[1.0]],0.0,[1,[0],[1.0]],[1.0,1.0,0.0,1.0,4.0,2.0,52004.0,60.0,5030.0,23.0,1257.0,125.0,3.0,1.0,1.0,1000.0,0.0],[2.9466019417475726,8.67282872405483,8.380569334197599],[0.14733009708737863,0.4336414362027415,0.4190284667098799],1.0,"Low"]]}]}
You can also access the online model deployment directly through the REST API. This allows you to use your model for inference in any of your apps. For this code pattern, we'll be using a Python Flask application to collect information, score it against the model, and show the results.
NOTE: This application only runs on Python 3.6 and above, so the instructions here are for Python 3.6+ only.
The general recommendation for Python development is to use a virtual environment (venv
). To install and initialize a virtual environment, use the venv
module:
In a terminal, go to the stocktraderapp
folder within the cloned repo directory.
git clone https://github.com/IBM/icp4d-customer-churn-classifier
cd icp4d-customer-churn-classifier/stocktraderapp
Initialize a virtual environment with venv
.
# Create the virtual environment using Python.
# Note, it may be named python3 on your system.
python -m venv venv # Python 3.X
# Source the virtual environment. Use one of the two commands depending on your OS.
source venv/bin/activate # Mac or Linux
./venv/Scripts/activate # Windows PowerShell
TIP To terminate the virtual environment use the
deactivate
command.
Finally, install the Python requirements.
pip install -r requirements.txt
It is best practice to store configurable information as environment variables, instead of hard-coding any important information. To reference our model and supply an API key, we will pass these values in via a file that is read; the key-value pairs in this file are stored as environment variables.
Copy the env.sample
file to .env
.
cp env.sample .env
Edit the .env file to provide the URL
and TOKEN
.
URL
is your web service URL for scoring.TOKEN
is your deployment access token.# Copy this file to .env.
# Edit the .env file with the required settings before starting the app.
# Required: Provide your web service URL for scoring.
# E.g., URL=https://9.10.222.3:31843/dmodel/v1/project/pyscript/tag/score
URL=
# Required: Provide your web service deployment access token.
# E.g., TOKEN=abCdwFghIjKLMnO1PqRsTuV2wWX3YzaBCDE4.fgH1r2... (and so on, tokens are long).
TOKEN=
# Optional: You can override the server's host and port here.
HOST=0.0.0.0
PORT=5000
Start the flask server by running the following command:
python StockTraderChurn.py
Use your browser to go to http://0.0.0.0:5000 and try it out.
TIP: Use
ctrl
+c
to stop the Flask server when you are done.
Enter some sample values into the form:
Click the Submit
button and the churn prediction is returned:
Pressing Reset
allows you to go back and enter new values.
This section provides an alternative to accessing a local csv file in your notebook. This requires that you have created a Db2 Warehouse database deployment in your IBM Cloud Pak for Data cluster or on IBM Cloud. With it, you can access the integrated database console to complete common tasks, such as loading data into the database.
You can follow the instructions provided in Steps 4, 5 and 6 of the Virtualizing DB2 Warehouse data with data virtualization tutorial to seed the DB2 warehouse (using the mergedcustomers.csv file provided in this repo), obtain the connection details for your DB2 warehouse and use the connection details to add a connection to your IBM Cloud Pak for Data cluster.
IMPORTANT: For this code pattern, remember to seed the DB2 warehouse with the data/mergedcustomers.csv
file from your cloned repo and not the file mentioned in the Virtualizing DB2 Warehouse data with data virtualization tutorial.
Add to project +
and select Connection
:From platform
tab, select the DB2 Warehouse connection that was added earlier:Test
to test the connection. Once you get a message that says Connection test passed
, click Add
:Add to project +
and choose Connected data
:Select source
:Select
:Create
:Follow the remaining instructions above starting from 3. Create a Space for Machine Learning Deployments. When adding the pandas dataFrame in your notebook, choose the asset name that you had provided when adding the connected data to your project.
This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.