enigma-io / smoke-signals-model

The Machine Learning Algorithms that power Smoke Signals.
http://labs.enigma.io/smoke-signals
38 stars 16 forks source link

smoke-signals-model

This repository contains code and documentation for generating scores that help indicate whether or not the residents of a census block group have a high risk for its residents not having smoke alarms. You can read an overview of the analysis here. This analysis is made possible by mapping common variables in the American Housing Survery and the American Community Survey. You can see details on how these mappings are done in this repository.

Getting Started.

Installation

First clone the repository and navigate to the project's root directory:

git clone https://github.com/enigma-io/smoke-alarm-risk.git
cd smoke-alarm-risk

This project is written in R and depends on the following packages:

You can install these packages by running the following command in the project's root directory:

$ make init

Get the data

This project also requires six csv files (two of which - the ACS and the AHS, are generated by this project). You can grab these files from the web by running the following command:

$ make fetch_data

WARNING: This may take a while. The ACS file is ~ 2 GB.

Once this is finished, you should see five files in data/:

Once you've run got these files, you should be all set to generate risk scores.

Generate the risk scores.

First, open up index.md and change this line to your working directory:

WD <- '/path/to/this/directory'

Execute the model using this command:

$ make model

Under the hood, this command executes index.Rmd, which is a RMarkdown file. It contains notes on each step of our process and generates plots which visualize our results. You can see the finalized output of the modeling process by typing this command:

$ make view

If you open a web browser and navigate to http://localhost:8000/ you should see the report on the modeling process.

Get the output.

When the modeling script has finished executing, the risk scores per block group will be output to data/smoke-alarm-risk-scores.csv. These also include total population and at-risk population (< 5 years old, > 65 years old) per block group.

Known Issues

bigrf seems to have a memory leak when executed within RStudio. This can be avoided by simply using the make model command. SEE: https://github.com/aloysius-lim/bigrf/issues/16.