According to my research second-digit tests are more reliable for detecting election fraud

draeder commented 3 years ago

See: http://www-personal.umich.edu/~wmebane/pm06.ps, https://www.degruyter.com/view/journals/jbnst/231/5-6/article-p719.xml and https://www.researchgate.net/publication/275305550_Comment_on_Benford's_Law_and_the_Detection_of_Election_Fraud

ghost commented 3 years ago

First-digit tests were also found in this thread to give misleading results: https://github.com/cjph8914/2020_benfords/issues/5

Accordingly, I suggest that we immediately implement a second-digit test.

(minor edit)

dshield55 commented 3 years ago

How do you do a 2nd digit test?

I just did one off a large data set not suspected of fraud. Excel. When I do the 1st digit test you can see the curve so well. When I do the 2nd digit test, it's not near as dramatic even though there's more 0s than 1s than 2s.

Isn't there a d number or something that is calculated to determine how good a fit the curve is? Anyone know how to do that?

ghost commented 3 years ago

According to Benford's law, the 2nd digit will likely have a different distribution than the 1st digit. Maybe draeder (or his papers, linked above) has the exact formula to use.

ghost commented 3 years ago

I recovered the second-digit factors from this paper: https://www.jstor.org/stable/23011436

0 - 0.120 1 - 0.114 2 - 0.109 3 - 0.104 4 - 0.100 5 - 0.097 6 - 0.093 7 - 0.090 8 - 0.088 9 - 0.085

Mean: 4.187

Academic literature suggests all kinds of pitfalls when it comes to interpreting analyses like this, but for now let's not overcomplicate the issue. Just do a second-figure test, and see if it agrees with the above, and if you know how also calculate p-values.

Note that the analysis is based on a pretty implausible hypothesis, namely that the totals for each precinct are taken out of thin air.

dshield55 commented 3 years ago

I did just a basic graph all the 2nd digits operation here, no other fancy analysis. The bottom one isn't expected to be fraudulent and you can see it still follows the rules of more 1s than 2s than 3s, etc. However with Biden in Chicago, it's jacked on the 2nd digits.

I understand the reasons we can't necessarily trust the 1st digits, but is there a reason we wouldn't be able to trust the distribution of 2nd digits?

ghost commented 3 years ago

I understand the reasons we can't necessarily trust the 1st digits, but is there a reason we wouldn't be able to trust the distribution of 2nd digits?

Academic literature suggests that one needs to be careful about drawing conclusions, see e.g. https://doi.org/10.1515/jbnst-2011-5-610 - Anyhow, keep these diagrams coming!

(edit: removed diagrams from quote)

dshield55 commented 3 years ago

Here's the percentages for the my last posted chart alongside the second-digit factors. The Trump one matches quite well, but not Biden.

dshield55 commented 3 years ago

*I'm out of time and likely won't be contributing more

In Chicago, Trump appears to hold up with 2nd digits but Biden does not. No fancy analysis here, just eyeballing it.

With a second digit analysis, I don't know what to do with entries that don't have a second digit, so I counted 119 of them as null, was I supposed to add them to the zeroes? The lowest biden got in a precinct was 33, but Trump got somes 7s and 9s, etc.

ghost commented 3 years ago

In Chicago, Trump appears to hold up with 2nd digits but Biden does not. No fancy analysis here, just eyeballing it.

I have tried to outline how the second digits of results in cities could just as well follow a uniform distribution, here: https://github.com/cjph8914/2020_benfords/issues/16

I don't understand, however, why the results of each candidate don't follow the same pattern. Maybe it has to do with how precinct sizes are planned in inner cities vs. in suburban areas, but that's just a hypothesis.

chavenor commented 3 years ago

Yes, there are wards voting like 98% one way. So this would show up in the graph would it not? https://results.enr.clarityelections.com/PA/Allegheny/63905/Web02.193333/#/cid/0104

PoiSonPeZ commented 3 years ago

Did you guys ever consider that you are combining two separate elections - mail-in ballots and in-person ballots? They are being collected and counted separately. The mail in ballots have shown a considerable Democrat preference. It's almost as if two separate elections are occurring. I suspect this is the reason for the discrepancy you are seeing.

ghost commented 3 years ago

In Chicago, Trump appears to hold up with 2nd digits but Biden does not. No fancy analysis here, just eyeballing it.

I did the same analysis, and my diagrams deviate slightly from yours (e.g. I have 210 zeros for Biden, but otherwise it's more or less the same). I'm not sure exactly why we see the pattern in Trump's second digits and not in Biden's.

cjph8914 / 2020_benfords

According to my research second-digit tests are more reliable for detecting election fraud #12