codeforkansascity / Property-Violations-Settlement

Analyzing Kansas City's open data on property violations
5 stars 7 forks source link

Create an R Script to load violations data consistently. #34

Open buzwells opened 8 years ago

EricRoche commented 8 years ago

Recommend using the RSocrata package by Chicago. It's available on CRAN. https://github.com/Chicago/RSocrata

dwancura commented 8 years ago

R script to read in 2015 KCMO property violations data set

preping here for data visualization only, based on neighborhood

violation.data <- read.csv("C:/Users/dwanc_000/Documents/aRockhurst/Project - Property Violations/Property_Violations.csv")

violation.data.closed <- subset(violation.data, (violation.data$Status == "Closed")) #create df closed violations only violation.data.closed_over60 <- subset(violation.data.closed, (violation.data.closed$Days.closed > 60)) #create df closed violations only violation.data.closed_over60_under500 <- subset(violation.data.closed_over60, (violation.data.closed_over60$Days.closed < 500)) #create df closed violations only

remove non-applicable data values - experimented with different ones before coming to this set to remove

violation.data.closed_over60_under500 <- subset(violation.data.closed_over60_under500,,-c(Property.Violation.ID, Case.ID, Status, Case.Closed.Date, Ordinance.Number, Ordinance.Chapter))

deleting records where key variables may be empty

violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$Case.closeded.Date),] #delete NA data for key vars violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$Days.closed),] #delete NA data for key vars violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$Violation.Code),] #delete NA data for key vars violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$Violation.Description),] #delete NA data for key vars violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$Violation.Entry.Date),] #delete NA data for key vars violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$Address),] #delete NA data for key vars violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$County),] #delete NA data for key vars violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$State),] #delete NA data for key vars violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$Zip.Code),] #delete NA data for key vars violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$KIVA.PIN),] #delete NA data for key va violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$Council.District),] #delete NA data for key va violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$Police.Patrol.Area),] #delete NA data for key va violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$Inspection.Area),] #delete NA data for key va violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$Code.Violation.Location),] #delete NA data for key va violation.data.closed_over60_under500 <- violation.data.closed_over60_under500[!is.na(violation.data.closed$Neighborhood),] #delete NA data for key vars

save dataset for different types of analysis

write.csv(violation.data.closed_over60_under500, file = "C:\Users\dwanc_000\Documents\aRockhurst\Project - Property Violations\closed_violations_over60_under500.csv")

buzwells commented 8 years ago

Reviving this issue. Based on a discussion at our April 11, 2016 meeting, we will start loading the raw data by reading it from the open data site using the RSocrata package that Eric demonstrated. We will also scrub the data for obvious problems (for instance, missing data). Subsequent steps, such as identifying and eliminating outliers and joining the GEOID, are covered in separate issues. The group also agreed to store the data in rdf format for the sake of efficiency and compactness.

I can take a stab at this, as I've got scripts in my own workspace that already do much of this.

buzwells commented 8 years ago

Created this pull request re this issue: https://github.com/codeforkansascity/Property-Violations-Settlement/pull/53