cjph8914 / 2020_benfords

368 stars 82 forks source link

Allegheny, PA absentee votes, second digit #27

Open ghost opened 3 years ago

ghost commented 3 years ago

I made (and corrected) a quick analysis of second digits for absentee votes only in Allegheny, PA.

Allegheny absentee votes - second digit
ghost commented 3 years ago

Another quick diagram from Allegheny, PA, this time it's % of total for Biden/Harris, all votes (not only absentee votes).

image

ghost commented 3 years ago

Another diagram showing the same information

image

Although this is not Benford's Law related, it would be interesting to see if the same pattern holds for similar counties.

charlesmartin14 commented 3 years ago

@testes-t Can you add a red line, showing the expected second-digit distribution, to the bar plots above

This may help (if you are in python): https://github.com/milcent/benford_py/blob/master/Demo.ipynb

charlesmartin14 commented 3 years ago

This is what I found for the second digit data for Allegheny, Absentee, using the package above

Screen Shot 2020-11-08 at 9 21 56 PM Screen Shot 2020-11-08 at 9 18 55 PM

Biden

Screen Shot 2020-11-08 at 9 15 56 PM

Trump (the 0 data point looks off ?)

Screen Shot 2020-11-08 at 9 17 30 PM

Other than the Trump 0 data being weird (maybe this package requires a special format for the data, or has a bug ?), the data looks reasonable

ghost commented 3 years ago

I also had too many 0 digits first, and was wondering what on Earth was going on. However, it turned out to be a result of not having excluded Trump's many one-digit results, so you need to fix that bit in the Trump diagram above.

(Edit:) Biden also had a few one-digit results, so his number of zeros is likely a little lower than in your diagram once you fix it.

ghost commented 3 years ago

Chicago, total (not only absentee): image

ghost commented 3 years ago

Fulton (Atlanta) - total, not only absentee:

image

There's seems to be more people in each precinct for Fulton.

ghost commented 3 years ago

Comment: Allegheny (Pittsburgh), Fulton (Atlanta) and Chicago have a "tail" starting at about 90-95%. I am not saying that the tail shouldn't be there, but it's a little strange.

charlesmartin14 commented 3 years ago

@testes-t I'm having some trouble interpreting the plots..what is the y-axis ? Can you walk us through it ? thanks

ghost commented 3 years ago

@charlesmartin14

A smooth normal distribution would have ~sigmoid shape. Flat curve = low frequency; Steep curve = high frequency. The y-axis shows the cumulative number of wards with a Biden vote less than the percentage shown in the x-axis.

So for Fulton, 80% Biden is less commonplace than 95% Biden. It's clearly not a normal distribution. Hypothetically, that could be due to wards being either rural or urban, and only rarely something inbetween, so that the diagram ends up looking like two different normal distributions with expected values 50% and 95%, respectively, that have been added together. I have not seen any literature that have suggested this pattern to be indicative of fraud, but it's interesting nonetheless.

ghost commented 3 years ago

For reference, here is the same diagram for Minnesota. It looks more like a sigmoid:

image

Above I seem to have explained the y-axis in a somewhat convoluted manner. The blue lines are the Biden % bars for each precinct (this time an ad-hoc precinct number is retained as text by the y-axis)

ghost commented 3 years ago

Same plot for precincts in Hennepin County (i.e. Minneapolis): image

We see the same non-sigmoid shape as in the other cities.

ghost commented 3 years ago

I thought it would be interesting to check a city which is not in a battleground state. The pattern is not seen in Orleans Parish (part of New Orleans). But here, note that early votes are not included due to lack of data:

image

Note that I could be making errors all along, none of my charts have been "verified" so to speak.

ghost commented 3 years ago

Election day votes and absentee votes for Allegheny, when seen in isolation, both seem to follow more or less a sigmoid pattern:

image

image

ghost commented 3 years ago

Note that my Allegheny charts are based on the file in this project, I did not collect it myself. I assume that the data is not incomplete.

CoolOppo commented 3 years ago

Another quick diagram from Allegheny, PA, this time it's % of total for Biden/Harris, all votes (not only absentee votes).

I am not understanding this chart at all. What is the Y-axis? And why are things binned the way they are on the x-axis? I am very confused

ghost commented 3 years ago

I am not understanding this chart at all. What is the Y-axis? And why are things binned the way they are on the x-axis? I am very confused

It's a large collection of horizontal blue bars that represent Biden's vote share in each precinct. So the y-axis just counts up from precinct 1 to N as sorted by vote share.

CoolOppo commented 3 years ago

I see. So if I'm getting this correct, ~225 precincts had a vote share of 48%-53% for Biden in Allegheny?

ghost commented 3 years ago

I see. So if I'm getting this correct, ~225 precincts had a vote share of 48%-53% for Biden in Allegheny?

Edit: I now see that you were talking about the histogram in the second comment from the top. Yes, you got it right.

ghost commented 3 years ago

So from the above, I hypothesised that absentee votes in Fulton, GA could show some kind of interesting pattern, so I created the following chart: image

It's smooth; there doesn't seem to be anything suspicious here. I have spent quite a few hours investigating this now and didn't really find any smoking gun anywhere, so I'll end my investigation here.

iraykhel commented 3 years ago

Thank you for a detailed analysis. Do you think you can create a chart like "Biden/Harris share of votes" like in the second comment, but for 2016 election instead, so we can see how it compares? The non-decreasing tail from 70% to 97% seems iffy, I was wondering if that was also present in 2016 election.

ghost commented 3 years ago

So, there are rumours that 130,000 invalid votes have been cast in Fulton County (Atlanta). Could be fake news, I never saw this news website before, so hard to tell: https://rfangle.com/election/breaking-132000-ballots-in-georgia-likely-ineligible/

The interesting thing is that by far the most strange chart of all I have uploaded is the one I cite below, from precisely Fulton. The chart should normally take on a sigmoid shape (like Minnesota above), but simply doesn't. So it's a funny coincidence. If feasible, you could try to analyse the deviation from normal/chi squared/poisson/whatever further.

Fulton (Atlanta) - total, not only absentee:

image

justin-winter commented 3 years ago

One thing you might check on Fulton is that it is a really weird shape (formed from 2 other bankrupt counties in the 1930s), so it really is 3+ distinct areas with very uneven economics. It has the richest homes and best schools in mid to north Fulton and some of the poorest neighborhoods and worst schools in mid to south Fulton. So in terms of the chart, that might actually be a bimodal or trimodal distribution.

So, there are rumours that 130,000 invalid votes have been cast in Fulton County (Atlanta). Could be fake news, I never saw this news website before, so hard to tell: https://rfangle.com/election/breaking-132000-ballots-in-georgia-likely-ineligible/

The interesting thing is that by far the most strange chart of all I have uploaded is the one I cite below, from precisely Fulton. The chart should normally take on a sigmoid shape (like Minnesota above), but simply doesn't. So it's a funny coincidence. If feasible, you could try to analyse the deviation from normal/chi squared/poisson/whatever further.

Fulton (Atlanta) - total, not only absentee: image

ghost commented 3 years ago

It's not anywhere close to Gaussian. Why are there so few precincts at the median, around 80%? And why do we see this pattern in Fulton, but not in Orleans?