Open mlamias opened 3 years ago
yes it's been shown to be susceptible to false positives..... deviations from Benford's when there is no fraud. In this particular case, the total number of voters per precinct does not span many orders of magnitude, so Benford's law is unlikely to apply. Here's a histogram of the sum of Biden and Trump voters which is a decent approximation to the number of voters per precinct. It would be interesting to see any scholarship where they look at the distribution of deviations from Benford's law. Would you expect Trump leaning districts in PA to also have deviations from Benford in similar frequency to Biden's winning areas?
It seems this depends on precinct distribution. In Russia and Iran for example, the reason some of the papers claim you can't use the first digit is the precincts are too evenly distributed. This makes sense, since you wont get numbers that span orders of magnitude.
However, in at least some cities in question in the US, this doesn't appear to be the case by precinct. In Milwaukee for example, the number of registered voters is as low as 4 and goes as high as several thousands. Granted, most are on the order of 10^2 and 10^3.
Then again, if this is a deal breaker, why is it that it seems to provide expected distributions for Trump and Jo Jorgensen, but only flag anomalies with Biden?
It seems this depends on precinct distribution. In Russia and Iran for example, the reason some of the papers claim you can't use the first digit is the precincts are too evenly distributed. This makes sense, since you wont get numbers that span orders of magnitude.
However, in at least some cities in question in the US, this doesn't appear to be the case by precinct. In Milwaukee for example, the number of registered voters is as low as 4 and goes as high as several thousands. Granted, most are on the order of 10^2 and 10^3.
Then again, if this is a deal breaker, why is it that it seems to provide expected distributions for Trump and Jo Jorgensen, but only flag anomalies with Biden?
Because Biden is the one being accused of election fraud.
It seems this depends on precinct distribution. In Russia and Iran for example, the reason some of the papers claim you can't use the first digit is the precincts are too evenly distributed. This makes sense, since you wont get numbers that span orders of magnitude.
However, in at least some cities in question in the US, this doesn't appear to be the case by precinct. In Milwaukee for example, the number of registered voters is as low as 4 and goes as high as several thousands. Granted, most are on the order of 10^2 and 10^3.
Then again, if this is a deal breaker, why is it that it seems to provide expected distributions for Trump and Jo Jorgensen, but only flag anomalies with Biden?
This repo is attempting to apply Benford's Law to vote count distribution, so that's what actually needs to span multiple orders of magnitude. Precinct distribution is a factor in vote count distribution but it doesn't tell the whole story.
I don't see the Milwaukee data in this repo, but take a look at the Chicago data: https://github.com/cjph8914/2020_benfords/blob/main/data/chicago_dataexport.csv
Biden's vote totals are solidly contained within one order of magnitude, the 100-999 range. Trump's vote totals range from single digits into the hundreds, across three orders of magnitude. Jo Jorgensen is mostly in the 0-20 range, across two orders of magnitude.
@SageGaspar good explanation
yes it's been shown to be susceptible to false positives
What are the odds of a false positive? If you look at 6 counties where fraud was suspected and found positives in 6 of them, do we assume that getting 6 positives is likely really to be false positives in all 6? What are the odds if only 2 positives are found that both are actually false positives?
I don't see the Milwaukee data in this repo
The Milwaukee data is being scraped from this site. The vote counts in that data go up to 2800 for Biden and 2000 for Trump.
I wonder if you could reanalyze the data of those very evenly sized districts with similar numbers of votes for Biden in base 5 or so to get more orders of magnitude at the cost of fewer different digits. Maybe the smaller differences will then span enough orders of magnitude after all that the Benford-like distribution will appear again.
The whole point of the first digit indicator is to find unusual situations. We know that Benford's Law can not be applied to datasets with upper and lower bounds. The question is, why do the Chicago patterns for Biden's cluster in the range from 100-999 ? Is this a natural voting pattern, or is it due to something else ? Researchers in election forensics see this behavior all over, and they generally explain this as election strategy ? In other words, get-out-the-vote efforts and so on. But they don't really know because they are not on the ground in Chicago. This could be something else, like buying votes, voter intimidation, ballot stuffing, etc. We don't know.
It is exactly this anomalous behavior that is in question.
This is why researchers look for other patterns in the voting data, such as second digit and last-digit patterns, as well as the distributional patterns of the vote counts themselves.
Here is a nice talk on the subject from one of the leaders of the field https://www.youtube.com/watch?v=zkx_eO0PvXU
Notice that they never look at the distribution of digits itself, but, rather, are looking for statistical indicators that characterize the distributions, such as a good mean value and reliable upper and lower bounds.
In the situations we are seeing in many cities, on the surface , the voting patterns are seemingly so odd to as to qualify in the Klimick model as extreme fraud (go to t=3522s )but to apply the model one may need to examine the data in more detail
Nice analysis. However, I wanted to point you to a few articles that may be of interest to you. Essentially the research suggests Benford's is unreliable when applied to election data:
https://repository.library.georgetown.edu/handle/10822/557850
https://www.jstor.org/stable/23011436?seq=1
https://courses.math.tufts.edu/math19/duchin/dmo.pdf
https://www.cambridge.org/core/journals/political-analysis/article/benfords-law-and-the-detection-of-election-fraud/3B1D64E822371C461AF3C61CE91AAF6D