I added benford statistical tests & updated README.md for better explainations.

cjph8914 / 2020_benfords

369 stars 83 forks source link

I added benford statistical tests & updated README.md for better explainations. #24

Open ghost opened 3 years ago

ghost commented 3 years ago

I redid this pull request, because I screwed up my other repository.

ghost commented 3 years ago

I can do second digital statistical tests if people want.

Marcotte67 commented 3 years ago

I have a question, if the number of samples at the district level is less than 100, is it meaningful to look that granular?

Also, is anyone looking closely at the raw data? In looking at Harvard's data 1976-2016, I found two rows for trump in Maryland (one listed him as a write in with 259 votes, New York listed him in the Conservative party with 292,392 votes).

Has anyone seen this kind of thing???

ghost commented 3 years ago

The module I'm using to do the statistical tests says you should have a sample size of at least 50.

Dataset should preferably cover at least 1000 samples. Though Benford's law has been shown to hold true for datasets containing as few as 50 numbers.

https://pypi.org/project/benfordslaw/

ghost commented 3 years ago

I can do second digital statistical tests if people want.

Great, just double-check that one-digit figures are handled properly. If you get a lot of zeros, that can be a sign that something is wrong.

ghost commented 3 years ago

@testes-t I will do that. I'm going to a historical Benford law analysis. I'm going to create a chart of the percent of counties that violate Benford's law by election cycle going back as far back as possible. It will only include counties with precincts that have population that follow Benford law and have over 50 precincts.

homage-admin commented 3 years ago

I suggest you apply Benford's law to the 2016 -> 2020 differences, rather than 2016 and 2020 separately. This is done here.