Closed grantschwab closed 1 year ago
Thanks very much for this. I've made all those changes. Let me know if you see any issues remaining.
We saw that there were some small discrepancies in the total votes, but because we cleaned the raw files alongside OpenElections from the county-by-county PDFs, in general we can't really tell where a discrepancy comes from if it's introduced by OCR or manual entry from the PDF (though of course we try to check it against the PDF and get as close as we can). It's extremely helpful to have other eyes on the original PDFs, and I appreciate the well-documented corrections.
Regarding the total vote row, looks like I included that when I cleaned Yazoo for OpenElections. I like to retain all the data the state offers, but other cleaners will differ in their practices, which I think is especially reasonable in this case, where you can imagine a volunteer choosing how much time to spend inputting potentially redundant data manually from a messy PDF. For our part, we never drop rows like that when they're available, but we do talk about this in the warnings section of our README, because as I'm sure you've seen plenty of times even within a state counties will often differ in whether or not they report these total rows.
Thanks again and please let me know if any issues persist.
Of course! And thank you for making those changes! In case it's useful: I handmade a dataset of county totals for the Mississippi US House races using the state PDFs and checked that against county totals calculated from MEDSL's data.
I don't normally love reading through county election PDFs (who does?), but I was able to identify the few counties where discrepancies occurred and manually check only those.
That makes sense, re: total vote row. Plus it's easy to drop from the dataset to avoid redundancy.
Hello! I noticed 11 instances in which precinct vote totals for a congressional candidate don't match official state data. I also noticed that your data includes a "total" precinct for all of Yazoo County, but no others.
As for the 11 discrepancies: The county, precinct, and candidate names are included in the code below, along with the correct vote tally. You can find the state's totals here, in PDFs from the MS Sec. of State.
Also, for reference, here's a dictionary of candidate names in my code.
GCON02RFLO = Brian Flowers (R), MS-02 GCON01DBLA = Dianne Black (D), MS-01 GCON04DDUP = Johnny DuPree (D), MS-04 GCON03RGUE = Michael Guest (R), MS-03 GCON03DYOU = Shuwaski Young (D), MS-03
I submitted this issue with OpenElections on their GitHub, too.
Best, Grant @ Redistricting Data Hub