Open EricRoche opened 8 years ago
@buzwells will handle.
Cross-posting here what I posted on Slack last week re this issue:
First draft of the core violations data combined with the block group info from the modified address master file is now on Jason’s Google Drive. For now, I only brought across GEOID (not AFFGEOID), because GEOID appears (based on the description @ https://www.census.gov/geo/reference/geoidentifiers.html) to have state, county, tract and block group. I did not split GEOID into its constituent parts, on the assumption that we’ll be joining to census data with the fields combined. Depending on Ron’s feedback on these fields, I can reverse either or both of those decisions. I supplied the new data both in a CSV and in an RDF. I actually prefer working with the data in RDF. It’s faster, smaller and easier to load without any scripting to specify data types. And it looks like you can read RDF using Python. RDF is actually small enough to load in our GitHub repo. The data also includes Longitude and Latitude columns broken out from Code.Violation.Location column, as well as a new column called ‘Ordinance.Title’ that has the ordinance number paired with a description (for the most common ordinances). Finally, I made some judgments about data types, preferring character types for data that might be nominally numeric but really won’t be used quantitatively (e.g. IDs). Feel free to take issue with any of the above, and we can refine the dataset into a form that everyone is comfortable with.
Need to include the script to accomplish this binding as part of our effort to begin by asking whether median income affects the number of violations in an area.
I can take this and also cover our initial reshaping of the data by grouping around GEOID (i.e., block group) and summarizing the violation count for this grouping.
Created a pull request re this issue: https://github.com/codeforkansascity/Property-Violations-Settlement/pull/53.
January 2009 - March 2015.