cjph8914 / 2020_benfords

368 stars 82 forks source link

Paper suggests that even second-digit analysis cannot be used #16

Open ghost opened 3 years ago

ghost commented 3 years ago

Please refer to chapter 2 in the following paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.697.5592&rep=rep1&type=pdf

The paper suggests precinct results in previous elections in a number of countries do not seem to follow the second-digit Benford distribution.

Let me try to outline why this does not hold for second digits either. If you have precincts in cities designed so that the votes for a certain candidate follows a chi squared distribution with an expected value of 5000 and a certain deviation, then the most likely result is 5000 (2nd digit: 0). The second most likely results are 4999 and 5001 (2nd digits: 9 and 0). The third most likely results are 4998 and 5002 (2nd digits: 9 and 0). Etc. (edit: i got this wrong the first time)

On the other hand, for a Benford distribution, the most likely result is 1. The second most likely result is 2. The third most likely result is 3. Etc.

Hence, using second digits does not fix the problem with planned precinct sizes. We can perhaps see from the example how Benford's Law will only work if the expected value of the distribution is 0. With rational planning of precinct sizes inside cities, that won't happen. Countryside precincts are more likely to follow the Benford pattern, as the number of votes in each precinct will be more "organically" determined and less planned.

It thus seems that the methodology cannot be applied inside cities.

dshield55 commented 3 years ago

with an expected value of 5000 and a certain deviation, then the most likely result is 5000 (2nd digit: 0).

I'm not sure how well this applies to the data I've been looking at. At first glance, I don't think it does.

The size of the precincts seem non-conforming to me as I've been looking around. Like here in Milwaukee, you can see they try to target precincts around 1,000 voters, but the actual number of registered voters per precinct looks pretty random. Im thinking # of registered voters per precinct probably fits it's own Benford curve, but then you have to include on top of that that each of those is going to have it's own turnout rates which in Maulwaukee range 50% - 97% (lol @ 97% turnout rate 680/702 voters.)

milwaukeehistregpre