This repository contains code and documentation for generating scores that help indicate whether or not the residents of a census block group have a high risk for its residents not having smoke alarms. You can read an overview of the analysis here. This analysis is made possible by mapping common variables in the American Housing Survery and the American Community Survey. You can see details on how these mappings are done in this repository.
First clone the repository and navigate to the project's root directory:
git clone https://github.com/enigma-io/smoke-alarm-risk.git
cd smoke-alarm-risk
This project is written in R
and depends on the following packages:
bit64
plyr
ggplot2
data.table
knitr
reshape2
scales
bigrf
pROC
You can install these packages by running the following command in the project's root directory:
$ make init
This project also requires six csv files (two of which - the ACS and the AHS, are generated by this project). You can grab these files from the web by running the following command:
$ make fetch_data
WARNING: This may take a while. The ACS file is ~ 2 GB.
Once this is finished, you should see five files in data/
:
acs-bg-at-risk-population.csv
- percent of population under the age of 5 and over the age of 65 per block group.acs-bg-population.csv
- total population per block group.acs-bg-pop-density.csv
- population density per block group.msa80-bg.csv
- A lookup of 1980 MSA IDs to 2010 Block Group IDs.acs.csv
- an export of the ACS with variables mapped to the AHS. (see this repo) ahs.csv
- an export of the AHS with variables mapped to the ACS. (see this repo) Once you've run got these files, you should be all set to generate risk scores.
First, open up index.md
and change this line to your working directory:
WD <- '/path/to/this/directory'
Execute the model using this command:
$ make model
Under the hood, this command executes index.Rmd
, which is a RMarkdown file. It contains notes on each step of our process and generates plots which visualize our results. You can see the finalized output of the modeling process by typing this command:
$ make view
If you open a web browser and navigate to http://localhost:8000/ you should see the report on the modeling process.
When the modeling script has finished executing, the risk scores per block group will be output to data/smoke-alarm-risk-scores.csv
. These also include total population and at-risk population (< 5 years old, > 65 years old) per block group.
bigrf
seems to have a memory leak when executed within RStudio. This can be avoided by simply using the make model
command. SEE: https://github.com/aloysius-lim/bigrf/issues/16.