IBM / continuous-learning-with-watson-ml-and-db2

Build models that learn over time with Watson Machine Learning, Watson Studio and IBM Db2 Warehouse on Cloud
https://developer.ibm.com/patterns/continuous-learning-with-wml-and-db2-warehouse-on-cloud/
Apache License 2.0
6 stars 8 forks source link
db2-warehouse ibm-cloud ibmcode watson-machine-learning watson-studio

WARNING: This repository is no longer maintained :warning:

This repository will not be updated. The repository will be kept available in read-only mode.

Continuous Learning with Watson Machine Learning and IBM Db2 Warehouse on Cloud

In this code pattern, we will use IBM Watson Machine Learning and Watson Studio — which allows data scientists and analysts to quickly build and prototype models — to monitor deployments, and to learn over time as more data becomes available. Performance Monitoring and Continuous Learning enables machine learning models to re-train on new data supplied by the user or other data sources. All applications and analysis tools that depend on the model are automatically updated as Watson Studio handles the selection and deployment of the best model.

In this code pattern, we’ll solve a problem for the City of Chicago using the Model Builder to model building violations. We’ll predict which buildings are most likely to fail an inspection, and we'll intelligently rank buildings by their likelihood to fail an inspection, saving time and resources for the city and building inspectors. We’ll begin by building a model on publicly available data from 2017, starting in September. Then, we will introduce data from October, November, and December data to simulate learning and model re-training over time.

When the reader has completed this Code Pattern, they will understand how to:

Flow

Architecture

  1. Initial source data is loaded into IBM Db2 Warehouse on Cloud database.
  2. The source data is then loaded, as a data asset, into Watson Studio.
  3. The Watson Machine Learning service uses the source data and computes an evaluation using Apache Spark-as-a-service to create a machine learning model, and saves the evaluation information back to the Db2 Warehouse on Cloud database.
  4. Apache Spark-as-a-service to compute the evaluation.
  5. Feedback data is uploaded to the feedback table in the Db2 Warehouse on Cloud database.
  6. Once the evaluation is done the Watson Machine Learning service creates a machine learning model.
  7. The model data is exposed through an API.
  8. Applications can use the API to evaluate new data and create a new model based on continuous learning.

Included components

Featured technologies

Watch the Video

Steps

  1. Clone the repo
  2. Create Watson Studio Project
  3. Create Db2 Warehouse on Cloud database and add the connection to Watson Studio
  4. Create and load data into Db2 Warehouse on Cloud database
  5. Add connected asset into Watson Studio
  6. Create Apache Spark as a service with IBM Cloud
  7. Create Watson Machine Learning with IBM Cloud
  8. Add new Watson Machine Learning Model to Watson Studio
  9. Add Feedback data and new evaluations to the continuously learning model
  10. Deploy the model to expose it through an API
  11. Test the model

1. Clone the repo

Clone the continuous-learning-with-watson-ml-and-db2 locally. In a terminal, run:

$ git clone https://github.com/IBM/continuous-learning-with-watson-ml-and-db2
$ cd continuous-learning-with-watson-ml-and-db2

2. Create Watson Studio Project

If you do not already have an IBM Cloud account, sign up for IBM Cloud and login to your IBM cloud account.

First you will need to create an Object Storage service if you don't already have one. From the catalog, search for object storage, select Object Storage service, choose the lite plan and click create.

Go back to catalog, search for Watson Studio, select it, choose the lite plan and click create.

Create a new Project by clicking the New Project link, choose Complete, give it a name and click create.

3. Create Db2 Warehouse on Cloud database and add the connection to Watson Studio

From the IBM Cloud catalog search for Db2 Warehouse on Cloud and create one using the appropriate plan.

Once the service is created, create new credentials by selecting Service Credentials option in the left navigation panel. Make sure to save the credentials for upcoming steps.

From Watson Studio project that you have created earlier, go to + Add to Project and choose Connection

Select Db2 Warehouse from the available options to connect to Db2 Warehouse on Cloud database you created earlier.

Configure the connection based on the Db2 credentials you saved earlier.

4. Create and load data into Db2 Warehouse on Cloud database

From the IBM Db2 warehouse service page, click Manage and click Open to go to IBM Db2 Warehouse on Cloud console.

Open the hamburger menu and select RUN SQL to open up a SQL editor.

In the sql editor, copy the SQL statement from the violations.sql file and click Run All option from the RUN drop down list at the top right.

Similarly, copy the SQL statement from the violations_feedback.sql file into the SQL editor and click Run All option from the RUN drop down list at the top right.

Note that "_training" column should be lower case in the create statement and in the trigger.

Next we will be loading the violations table from a CSV file. Click LOAD from the hamburger menu, which will bring you to a page where you can upload .csv file.

Browse the buildings_source_inspection_data_2017.csv from your project directory that you cloned earlier and click Next.

Choose the correct Schema, table VIOLATIONS and click Next.

Click Next on the next screen and click Begin Load to load the source data from the CSV file to the VIOLATIONS table.

5. Add connected asset into Watson Studio

In Watson Studio, go to your project and select the + Add to Project and select Connected assets option from the dropdown list.

Provide a name, and click Select source

Choose the Db2 database and the table that you created in the previous step. Click Create.

In the next screen, click Create to create the connected asset which will be used during creating of Watson machine learning model.

6. Create Apache Spark as a service with IBM Cloud

From the catalog in IBM Cloud, search for keyword spark and choose Apache Spark service.

Create the service using lite plan.

Once created, we need to add this service to Watson Studio. Go to your Watson studio project, select settings and from the + Add Service dropdown list, select spark and add the existing spark service that you have just created.

7. Create Watson Machine Learning with IBM Cloud

From the catalog in IBM Cloud, search for keyword machine learning and choose IBM Machine Learning service.

Create the service using lite plan.

Similar to the previous Step 5, Add the machine learning service you just created to your Watson Studio project.

8. Add new Watson Machine Learning Model to Watson Studio

From the Assets tab of your Watson Studio project, select + New Watson Machine Learning Model

Provide a name, choose the Machine Learning and Apache Spark instance that you added to your project, choose Model Builder for model typ, choose Manual so that you can prepare your own data and click Create.

Select the data asset that you created earlier from the options.

Once the data is loaded, choose the INSPECTION_STATUS as the column to predict for new set of data and All for feature columns. We will be using Binary Classification. Add Estiimators by clicking the + Add Estimators link, and in our case we will be using Logistic Regression and Decision Tree Classifier. You can select others as well based on what kind of estimator algorithm you want to choose.

Once the training and evaluation is done, you can choose the one that performed the best and then click Save.

9. Add Feedback data and new evaluations to the continuously learning model

Once the Watson Machine Learning Model is saved, select the Evaluation tab. First we need to configure the performance monitoring.

10. Deploy the model to expose it through an API

11. Test the model

You can access and test the API programmatically, or use curl commands. You can also go to the Test tab and provide a new set of data to evaluate the inspection status.

The result of the evaluation is shown in a horizontal graph located on the right side of the page.

Troubleshooting

If the evaluation gives an error as shown below, you need to upgrade the Machine Learning service instance to the Standard.

Links

Learn more

License

This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.

Apache Software License (ASL) FAQ