MEDSL / 2022-elections-official

Official returns for the 2022 Midterm Elections
16 stars 4 forks source link

Incorrect results in 11 precincts for Mississippi 2022 general #3

Closed grantschwab closed 1 year ago

grantschwab commented 1 year ago

Hello! I noticed 11 instances in which precinct vote totals for a congressional candidate don't match official state data. I also noticed that your data includes a "total" precinct for all of Yazoo County, but no others.

As for the 11 discrepancies: The county, precinct, and candidate names are included in the code below, along with the correct vote tally. You can find the state's totals here, in PDFs from the MS Sec. of State.

Also, for reference, here's a dictionary of candidate names in my code.

GCON02RFLO = Brian Flowers (R), MS-02 GCON01DBLA = Dianne Black (D), MS-01 GCON04DDUP = Johnny DuPree (D), MS-04 GCON03RGUE = Michael Guest (R), MS-03 GCON03DYOU = Shuwaski Young (D), MS-03

I submitted this issue with OpenElections on their GitHub, too.

Best, Grant @ Redistricting Data Hub

#Sharkey County
pivoted_2022.loc[pivoted_2022['precinct'] == 'SPANISH FORK FIRST DISTRICT', 'GCON02RFLO'] = 24  #State names this prec as FORT not FORK

#Tallahatchie
pivoted_2022.loc[pivoted_2022['precinct'] == 'WEBB BEAT #5', 'GCON02RFLO'] = 23

#Lee
pivoted_2022.loc[pivoted_2022['precinct'] == 'UNITY', 'GCON01DBLA'] = 11

#Pontotoc
pivoted_2022.loc[pivoted_2022['precinct'] == 'PONTOTOC 5 P5', 'GCON01DBLA'] = 86

#Jackson
pivoted_2022.loc[pivoted_2022['precinct'] == 'LATIMER', 'GCON04DDUP'] = 207

#Pearl River
pivoted_2022.loc[pivoted_2022['precinct'] == 'CARRIERE 5', 'GCON04DDUP'] = 80

#Clarke
pivoted_2022.loc[pivoted_2022['precinct'] == 'SOUINLOVIE', 'GCON03RGUE'] = 112

#Oktibbeha
pivoted_2022.loc[pivoted_2022['precinct'] == 'CENTRAL STARKVILLE', 'GCON03RGUE'] = 118
pivoted_2022.loc[pivoted_2022['precinct'] == 'SOUTH ADATON', 'GCON03RGUE'] = 178

#Neshoba
pivoted_2022.loc[pivoted_2022['precinct'] == 'NORTHWEST PHILADELPHIA', 'GCON03DYOU'] = 399

#Rankin
pivoted_2022.loc[pivoted_2022['pct_std'] == 'CITY HALL:::RAN', 'GCON03DYOU'] = 116
sbaltzmit commented 1 year ago

Thanks very much for this. I've made all those changes. Let me know if you see any issues remaining.

We saw that there were some small discrepancies in the total votes, but because we cleaned the raw files alongside OpenElections from the county-by-county PDFs, in general we can't really tell where a discrepancy comes from if it's introduced by OCR or manual entry from the PDF (though of course we try to check it against the PDF and get as close as we can). It's extremely helpful to have other eyes on the original PDFs, and I appreciate the well-documented corrections.

Regarding the total vote row, looks like I included that when I cleaned Yazoo for OpenElections. I like to retain all the data the state offers, but other cleaners will differ in their practices, which I think is especially reasonable in this case, where you can imagine a volunteer choosing how much time to spend inputting potentially redundant data manually from a messy PDF. For our part, we never drop rows like that when they're available, but we do talk about this in the warnings section of our README, because as I'm sure you've seen plenty of times even within a state counties will often differ in whether or not they report these total rows.

Thanks again and please let me know if any issues persist.

grantschwab commented 1 year ago

Of course! And thank you for making those changes! In case it's useful: I handmade a dataset of county totals for the Mississippi US House races using the state PDFs and checked that against county totals calculated from MEDSL's data.

I don't normally love reading through county election PDFs (who does?), but I was able to identify the few counties where discrepancies occurred and manually check only those.

That makes sense, re: total vote row. Plus it's easy to drop from the dataset to avoid redundancy.