This repository will not be updated. The repository will be kept available in read-only mode.
In this code pattern, we will use IBM Watson Machine Learning and Watson Studio — which allows data scientists and analysts to quickly build and prototype models — to monitor deployments, and to learn over time as more data becomes available. Performance Monitoring and Continuous Learning enables machine learning models to re-train on new data supplied by the user or other data sources. All applications and analysis tools that depend on the model are automatically updated as Watson Studio handles the selection and deployment of the best model.
In this code pattern, we’ll solve a problem for the City of Chicago using the Model Builder to model building violations. We’ll predict which buildings are most likely to fail an inspection, and we'll intelligently rank buildings by their likelihood to fail an inspection, saving time and resources for the city and building inspectors. We’ll begin by building a model on publicly available data from 2017, starting in September. Then, we will introduce data from October, November, and December data to simulate learning and model re-training over time.
When the reader has completed this Code Pattern, they will understand how to:
Clone the continuous-learning-with-watson-ml-and-db2
locally. In a terminal, run:
$ git clone https://github.com/IBM/continuous-learning-with-watson-ml-and-db2
$ cd continuous-learning-with-watson-ml-and-db2
If you do not already have an IBM Cloud account, sign up for IBM Cloud and login to your IBM cloud account.
First you will need to create an Object Storage
service if you don't already have one. From the catalog, search for object storage
, select Object Storage
service, choose the lite
plan and click create
.
Go back to catalog, search for Watson Studio
, select it, choose the lite
plan and click create
.
Create a new Project by clicking the New Project
link, choose Complete
, give it a name and click create.
From the IBM Cloud catalog search for Db2 Warehouse on Cloud
and create one using the appropriate plan.
Once the service is created, create new credentials by selecting Service Credentials
option in the left navigation panel. Make sure to save the credentials for upcoming steps.
From Watson Studio project that you have created earlier, go to + Add to Project
and choose Connection
Select Db2 Warehouse
from the available options to connect to Db2 Warehouse on Cloud database you created earlier.
Configure the connection based on the Db2 credentials you saved earlier.
From the IBM Db2 warehouse service page, click Manage
and click Open
to go to IBM Db2 Warehouse on Cloud
console.
Open the hamburger menu and select RUN SQL
to open up a SQL editor.
In the sql editor, copy the SQL statement from the file and click Run All
option from the RUN
drop down list at the top right.
Similarly, copy the SQL statement from the file into the SQL editor and click Run All
option from the RUN
drop down list at the top right.
Note that
"_training"
column should be lower case in the create statement and in the trigger.
Next we will be loading the violations
table from a CSV file. Click LOAD
from the hamburger menu, which will bring you to a page where you can upload .csv
file.
Browse the from your project directory that you cloned earlier and click Next
.
Choose the correct Schema
, table VIOLATIONS
and click Next
.
Click Next
on the next screen and click Begin Load
to load the source data from the CSV
file to the VIOLATIONS
table.
In Watson Studio, go to your project and select the + Add to Project
and select Connected assets
option from the dropdown list.
Provide a name, and click Select source
Choose the Db2 database
and the table
that you created in the previous step. Click Create
.
In the next screen, click Create
to create the connected asset which will be used during creating of Watson machine learning model.
From the catalog in IBM Cloud, search for keyword spark
and choose Apache Spark
service.
Create the service using lite
plan.
Once created, we need to add this service to Watson Studio. Go to your Watson studio project, select settings
and from the + Add Service
dropdown list, select spark
and add the existing spark service that you have just created.
From the catalog in IBM Cloud, search for keyword machine learning
and choose IBM Machine Learning
service.
Create the service using lite
plan.
Similar to the previous Step 5, Add the machine learning service you just created to your Watson Studio project.
From the Assets
tab of your Watson Studio project, select + New Watson Machine Learning Model
Provide a name, choose the Machine Learning
and Apache Spark
instance that you added to your project, choose Model Builder
for model typ, choose Manual
so that you can prepare your own data and click Create
.
Select the data asset
that you created earlier from the options.
Once the data is loaded, choose the INSPECTION_STATUS
as the column to predict for new set of data and All
for feature columns. We will be using Binary Classification
. Add Estiimators by clicking the + Add Estimators
link, and in our case we will be using Logistic Regression
and Decision Tree Classifier
. You can select others as well based on what kind of estimator algorithm you want to choose.
Once the training and evaluation is done, you can choose the one that performed the best and then click Save
.
Once the Watson Machine Learning Model is saved, select the Evaluation
tab. First we need to configure the performance monitoring.
Select the Configure Performance Monitoring
link
Add the spark service from the dropdown list. It's the one that you added to your Watson studio project.
Choose areaUnderPR
(performance metric of the model) and select the threshold as 0.8. This means if the performance is under 0.8, the model needs to be re-trained using all the source data and new data and hence continuous learning.
Use 500
as record count and click Save
.
For Auto Retrain
select when model performance is below threshold
For Auto Deploy
select when performance is better than previous model
Add the connection by selecting Select Feedback Reference Data
and select the Db2 connection that you previously created.
Once thats done, now you can add data using + Feedback Data
Once the feedback data is loaded, select New Evaluation
to evaluate the uploaded feedback data. You can unzip the provided data Chicago building inspection data by month 2017 in the repo and use that monthly inspection data as feedback data.
When the evaluation is completed we can see where the threshold value lies for this new feedback data. Diagram below shows that the performance exceeds the threshold value and hence the new version of the model is automatically deployed.
You can also see the list of evaluations that have been completed and see how the model has been continuously learning
You can upload new feedback data repeatedly from the provided data Chicago building inspection data by month 2017 so that the model continuously learns.
Deployments
tab. Click + Add Deployment
to add a new deployment,
Web Service
as deployment type.
Implementation
tab you can see different examples on how to use the newly created API.
You can access and test the API programmatically, or use curl commands. You can also go to the Test
tab and provide a new set of data to evaluate the inspection status.
The result of the evaluation is shown in a horizontal graph located on the right side of the page.
If the evaluation gives an error as shown below, you need to upgrade the Machine Learning
service instance to the Standard
.
This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.