closedloop-ai / cv19index

COVID-19 Vulnerability Index
http://cv19index.com
Other
88 stars 37 forks source link
coronavirus coronavirus-analysis covid-19 covid19 ml xgboost-model

Join us for our webinar on the CV19 Index on Wednesday, April 8th, 2020 from 2:00 – 3:00pm CDT.

With the 1.1.0 release, the CV19 Index can now make predictions for any adult. It is no longer restricted to Medicare populations.

The COVID-19 Vulnerability Index (CV19 Index)

Ion Channel Status License PyPI version Release

Install | Data Prep | Running The Model | Interpreting Results | Model Performance | Contributing | Release Notes

The COVID-19 Vulnerability Index (CV19 Index) is a predictive model that identifies people who are likely to have a heightened vulnerability to severe complications from COVID-19 (commonly referred to as “The Coronavirus”). The CV19 Index is intended to help hospitals, federal / state / local public health agencies and other healthcare organizations in their work to identify, plan for, respond to, and reduce the impact of COVID-19 in their communities.

Full information on the CV19 Index, including the links to a full FAQ, User Forums, and information about upcoming Webinars is available at http://cv19index.com

Data Requirements

This repository provides information for those interested in running the COVID-19 Vulnerability Index on their own data. We provide the index as a pretrained model implemented in Python. We provide the source code, models, and example usage of the CV19 Index.

The CV19 Index utilizes only a few fields which can be extracted from administrative claims or electronic medical records. The data requirements have intentionally been kept very limited in order to facilitate rapid implementation while still providing good predictive power. ClosedLoop is also offering a free, hosted version of the CV19 Index that uses additional data and provides better accuracy. For more information, see http://cv19index.com

Install

The CV19 Index can be installed from PyPI:

pip install cv19index

Notes for windows users: Some Microsoft Windows users have gotten errors when running pip related to installing the SHAP and XGBoost dependencies. For these users we have provided prebuilt wheel files. To use these, download the wheel for SHAP and/or XGBoost to your machine. Then, from the directory where you downloaded the files, run:

pip install xgboost-1.0.2-py3-none-win_amd64.whl
pip install shap-0.35.0-cp37-cp37m-win_amd64.whl

These wheel files are for Python 3.7. If you have a different Python version and would like prebuilt binaries, try https://www.lfd.uci.edu/~gohlke/pythonlibs/ . If you still have trouble, please create a GitHub issue.

Data Prep

The CV19 Index requires 2 data files, a demographics file and a claims file. They can be comma-separated value (CSV) or Excel files. The first row is a header file and remaining rows contain the data. In each file, certain columns are used, and any extra columns will be ignored.

The model requires at least 6 months of claims history, so only those members with at least 6 months of prior history should be included. It is not necessary that they have any claims during this period.

Sample input files are in the examples directory. demographics.xlsx and claims.xlsx

Demographics File

The demographics file should contain one row for each person on whom you want to run a prediction.

There are 3 required fields in the demographics file:

Claims File

The claims file contains a summary of medical claims for each patient. There can be multiple rows for each patient, one per claim. Both inpatient and outpatient claims should be included in the one file. If a patient has no claims, that patient should have no corresponding rows in this file.

There are 6 required fields and several optional fields in the claims file:

Note, if a patient first goes to the emergency room and then is later admitted, both the erVisit and inpatient flags should be set to true.

If you need to enter more than 15 diagnosis codes for a claim, you can repeat the row, set the erVisit and inpatient flags to false, and then add in the additional diagnosis codes on the new row.

Running the model

If you have installed the CV19 Index from PyPI, it will create an executable that you can run. The following command run from the root directory of the GitHub checkout will generate predictions on the example data and put results at examples/predictions.csv.

Note: The -a 2018-12-31 is only needed because the example data is from 2018. If you are using current data you can omit this argument.

cv19index -a 2018-12-31 examples/demographics.csv examples/claims.csv examples/predictions.csv

We also prove a run_cv19index.py scripts you can use to generate predictions from Python directly:

python run_cv19index.py -a 2018-12-31 examples/demographics.csv examples/claims.csv examples/predictions.csv

Help is available which provides full details on all of the available options:

python run_cv19index.py -h

Interpreting the results

The output file created by the CV19 Index contains the predictions along with the explanations of the factors the influenced those predictions.

If you simply want a list of the most vulnerable people, sort the file based on descending prediction. This will give you the population sorted by vulnerability, with the most vulnerable person first.

If you'd like to do more analysis, the predictions file also contains other information, including explanations of which factors most influenced the risk, both positively and negatively.

Here is a sample of the predictions output:

personId prediction risk_score pos_factors_1 pos_patient_values_1 pos_shap_scores_1 ...
772775338f7ee353 0.017149 100 Diagnosis of Pneumonia True 0.358
d45d10ed2ec861c4 0.008979 98 Diagnosis of Pneumonia True 0.264

In addition to the personId, the output contains:

Model Performance

There are 3 different versions of the CV19 Index. Each is a different predictive model for the CV19 Index. The models represent different tradeoffs between ease of implementation and overall accuracy. A full description of the creation of these models is available in the accompanying MedRxiv paper, "Building a COVID-19 Vulnerability Index" (http://cv19index.com).

The 3 models are:

We evaluate the model using a full train/test split. The models are tested on 369,865 individuals. We express model performance using the standard ROC curves, as well as the following metrics:

Model ROC AUC Sensitivity as 3% Alert Rate Sensitivity as 5% Alert Rate
Logistic Regression .731 .214 .314
XGBoost, Diagnosis History + Age .810 .234 .324
XGBoost, Full Features .810 .251 .336

Contributing to the CV19 Index

We are not allowed to share the data used to train the models with our collaborators, but there are tons of ways you can help. If you are interested in participating, just pick up one of the issues marked with the GitHub "help wanted" tag or contact us at covid19-info@closedloop.ai

A few examples are: