Data4Democracy / election-transparency

A Data4Democracy community working to make elections and elections data more transparent
89 stars 44 forks source link

Pulling down district level voter registration #22

Open chrisdick14 opened 7 years ago

chrisdick14 commented 7 years ago

Would like a time-series for redistricting work.

Smahoney37 commented 7 years ago

Attached: a file containing the availability of granular data by state - not necessarily district. Arizona data - not cleaned

Arizona.zip

Availability of Data.txt

chrisdick14 commented 7 years ago

Awesome work! We will have to start assigning people to each of the states we can pull.

gvpeek commented 7 years ago

I'm interested to jump in and help. I was thinking I could start with Texas, since that's where I am. But I'm happy to take on some other states too. That being said, I had some questions...

  1. Is there a list of states people are assigned to?
  2. What is the ideal state of the data, cleaned to a certain spec or just raw for now?
  3. Is there a place for these files to be committed or are they just living in comments for now?
KirkHadley commented 7 years ago

Hi,

So I'm not really sure where would be the best place to put these but I have for varying recent years (farthest back ~2008) voter files for CO, CT, DC, DE, FL, GA, MI, NC, OK, RI, UT, and WA. Would that be helpful?

kflanagan commented 7 years ago

You can find the current NC registered voter info here https://data.world/kflanagan/nc-statewide-voter-info Along with it is the SQL statement to create columns

chrisdick14 commented 7 years ago

@KirkHadley and @kflanagan we can definitely use this information. However, this is slightly different data than we have been using in the past so let me think about where we want to store it, and how it will fit into our current structure.

KirkHadley commented 7 years ago

@chrisdick14 I actually have that file for every NC election since 2005. Should I upload it to data.world? @kflanagan Has any thought been put into standardizing election results at the state level? If so, I have all the states state level election results at the district level and am more than happy to share.

kflanagan commented 7 years ago

@KirkHadley and @chrisdick14 The source for the data I posted is the state, here's their link. I don't know if there are efforts to standardize but given the sate of things at the federal level I doubt it. https://s3.amazonaws.com/dl.ncsbe.gov/data/ncvoter_Statewide.zip

chrisdick14 commented 7 years ago

@KirkHadley and @kflanagan there are two things we can do for these data. (1) You can post them yourself on data.world and tag them with 'd4d' and 'election transparency' (as well as any other tags you want to use), or (2) we can have you send us the data and we can upload directly to the d4d election transparency data.world page. I am totally fine either way. I agree about the standardization. The Open Elections Project has been doing some of this work: https://github.com/openelections/openelections-results-nc

I think one thing we could do is if we can get results from several states we can all agree on a format moving forward and put something out there, if that is something you all are interested in.

kflanagan commented 7 years ago

Given that I had already put the NC data up on data.world I just went and tagged them with d4d and election transparency. That'll get us started. I don't know what's best, the states keep their own formats, is it a good use of time to re-format every time they update the data? I think that NC updates weekly. Would use of data.world to present the data via SQL like queries be something that we could do to present it in a way that would allow folks to query across states?

chrisdick14 commented 7 years ago

@kflanagan I think that is a great idea. Especially with data that are coming out that regularly. I think if there were some 'clean' datasets we needed for projects we could pull the requisite data from your larger file and post it in the cleaned format that we end up using for analysis.

This is really fantastic. We are having a hackathon this weekend and who knows, someone may end up using these data in their analyses!

kflanagan commented 7 years ago

I found a flaw in my logic. Big data sets don't work so well it seems on data.world, file too large to extract from the archive. Maybe I'll try to upload the raw data, but of course the uncompressed file may be too big to upload raw. Perhaps we need to point at the county by county info for NC. I'll take a look at it this evening.

chrisdick14 commented 7 years ago

Let me know how big the data set would be. We can chat with the data.world folks and see if there is a work around. If not we may have some other options that I am exploring now to upload the data and make it public.

KirkHadley commented 7 years ago

So I have voter files on a good number of states (I'm a squirrel with these things). Details on sizes and such:

State-Total Size, Number of Voter Files, Range of Years

kflanagan commented 7 years ago

@KirkHadley is that the voter file that's found https://s3.amazonaws.com/dl.ncsbe.gov/data/ncvoter_Statewide.zip but with multiple years?

chrisdick14 commented 7 years ago

Ok, those are going to be too big for data.world I think. We are going to have to come up with another solution to host these. Let me do some asking around and see what we can find.

alistaire47 commented 7 years ago

Hi, I'm Edward. I'm new and happy to help. To get rolling I scraped the relevant PDFs off of the DC BoE site in the link above to see how hard the PDFs are to parse. The answer is (predictably) not terribly easy, but possible.

Given that, what data do we want?

I also saw on their website that you can get the whole voter file on CD-ROM (yeah) for $2 (yeah). It's not clear if how it handles formerly registered voters, but it's as granular as you can get—but since it's individuals, it's at least dubious to republish it unaggregated, even though it's all public data. I'm not sure we want it, but it's entirely possible to assemble a national voterfile; e.g. you can grab the Ohio CD CSVs at will.