DISCLAIMER: This application is used for demonstrative and illustrative purposes only and does not constitute an offering that has gone through regulatory review. It is not intended to serve as a medical application. There is no representation as to the accuracy of the output of this application and it is presented without warranty.
This application was built to demonstrate IBM's Watson Natural Language Classifier (NLC). The data set we will be using, ICD-10-GT-AA.csv, contains a subset of ICD-10 entries. ICD-10 is the 10th revision of the International Statistical Classification of Diseases and Related Health Problems. In short, it is a medical classification list by the World Health Organization (WHO) that contains codes for: diseases, signs and symptoms, abnormal findings, complaints, social circumstances, and external causes of injury or diseases. Hospitals and insurance companies alike could save time and money by leveraging Watson to properly tag the most accurate ICD-10 codes.
This application is a Python web application based on the Flask microframework, and based on earlier work done by Ryan Anderson. It uses the Watson Python SDK to create the classifier, list classifiers, and classify the input text. We also make use of the freely available ICD-10 API which, given an ICD-10 code, returns a name and description.
When the reader has completed this pattern, they will understand how to:
Clone the nlc-icd10-classifier
repo locally. In a terminal, run:
git clone https://github.com/IBM/nlc-icd10-classifier
cd nlc-icd10-classifier
Create the following service:
Log into IBM's Watson Studio. Once in, you'll land on the dashboard.
Create a new project by clicking + New project
and choosing Data Science
:
Enter a name for the project name and click Create
.
NOTE: By creating a project in Watson Studio a free tier Object Storage
service and Watson Machine Learning
service will be created in your IBM Cloud account. Select the Free
storage type to avoid fees.
Upon a successful project creation, you are taken to a dashboard view of your project. Take note of the Assets
and Settings
tabs, we'll be using them to associate our project with any external assets (datasets and notebooks) and any IBM cloud services.
The data used in this example is part of the ICD-10 data set and a cleaned version we'll use is available in the repo under data/ICD-10-GT-AA.csv. We'll now train an NLC model using this data.
From the new project Overview
panel, click + Add to project
on the top right and choose the Natural Language Classifier
asset type.
A new instance of the NLC tool will launch.
Add the data to your project by clicking the Browse
button in the right-hand Upload to project
section and browsing to the cloned repo. Choose the data/ICD-10-GT-AA.csv
file.
Select the ICD-10-GT-AA.csv
file you just uploaded and choose Add to model
.
Click the Train model
button to begin training. The model will take around an hour to train.
To check the status of the model, and access it after it trains, go to your project in the Assets
tab of the Models
section. The model will show up when it is ready. Double click to see the Overview
tab.
The first line of the Overview
tab contains the Model ID
, remember this value as we'll need it in the next step.
Follow the steps below for deploying the application:
Deploy to IBM Cloud
button below.From the IBM Cloud deployment page click the Deploy
button.
From the Toolchains menu, click the Delivery Pipeline to watch while the app is deployed. Once deployed, the app can be viewed by clicking View app.
The app and service can be viewed in the IBM Cloud dashboard. The app will be named nlc-icd10-classifier
, with a unique suffix.
We now need to add a few environment variables to the application's runtime so the right classifier service and model are used. Click on the application from the dashboard to view its settings.
Once viewing the application, click the Runtime
option on the menu and navigate to the Environment Variables
section.
Update the CLASSIFIER_ID
, and NATURAL_LANGUAGE_CLASSIFIER_APIKEY
variables with your Model ID
from Step 4 and NLC API key from Step 2. Click Save
.
After saving the environment variables, the app will restart. After the app restarts you can access it by clicking the Visit App URL button.
The general recommendation for Python development is to use a virtual environment (venv). To install and initialize a virtual environment, use the venv
module on Python 3 (you install the virtualenv library for Python 2.7):
Create the virtual environment using Python. Use one of the two commands depending on your Python version.
Note: it may be named python3 on your system.
python -m venv mytestenv # Python 3.X
virtualenv mytestenv # Python 2.X
Now source the virtual environment. Use one of the two commands depending on your OS.
source mytestenv/bin/activate # Mac or Linux
./mytestenv/Scripts/activate # Windows PowerShell
TIP :bulb: To terminate the virtual environment use the
deactivate
command.
Rename the env.example
file to .env
mv env.example .env
Update the .env
file with the NLC credentials for either username/password or API key
# Replace the credentials here with your own using either USERNAME/PASSWORD or IAM_APIKEY
# Comment out the unset environment variables
# Rename this file to .env before running app.py.
CLASSIFIER_ID=<add_nlc_classifier_id>
NATURAL_LANGUAGE_CLASSIFIER_APIKEY=<add_nlc_apikey>
Install the app dependencies by running:
pip install -r requirements.txt
Start the app by running
python app.py
Open a browser and point to localhost:5000
.
The user inputs information into the Text to classify: text box and the Watson NLC classifier will return ICD10 classifications with confidence scores.
Classification of Gastrointestinal hemorrhage:
This code pattern is licensed under the Apache Software License, Version 2. Separate third party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 (DCO) and the Apache Software License, Version 2.