CivicTechAtlanta / apd-crime-data

Working with publicly available data on Atlanta Police Department's "Crime Data Downloads" site at http://www.atlantapd.org/crimedatadownloads.aspx
19 stars 3 forks source link

Make Work for RSocrata #1

Open stuagano opened 8 years ago

stuagano commented 8 years ago

`Get and clean Atlanta Police Department Data get.apd.data <- function(selector.text, save.location){ require(rvest) require(dplyr) temp <- tempfile() download.file( read_html("http://www.atlantapd.org/crimedatadownloads.aspx") %>% html_nodes(selector.text) %>% html_attr("href") %>% paste0("http://www.atlantapd.org/", .), destfile = temp)

Save filename of csv crime data raw file, for later use

 crime.data.filename <- unzip(temp, list = TRUE)$Name
 # Unzip crime data to "data" folder and close temp file placeholder
 unzip(temp, exdir = save.location)
 unlink(temp)
 crime.data.filename`

Was there a reason why you did it this way instead of using SODA? Other PD's could definitely take advantage of this if it was using RSocrata from Chicago.

bbrewington commented 8 years ago

Good point. That will probably be the best route; I think we need to clean up the data in Socrata, though. Looks like there's a difference starting in 2015:

Socrata Data:

socrata crimes by rpt_date

COBRA051916.csv file:

num crimes by rpt_date - year month

Possible Cause: import tool is duplicating rows

socrata apd crime - rows affected repeating
bbrewington commented 8 years ago

@stuagano would it make more sense to do the ETL via https://dev.socrata.com/connectors/pentaho-kettle.html instead of the R script in this repo? Maybe if there was a way to schedule an R ETL job it could work. What do you think?

Here's the ETL current process (it would be nice to automate the last step of publishing cleaned data to Socrata):

etl process

edit: note to self - https://github.com/jwijffels/taskscheduleR