Open dogweather opened 3 years ago
Thanks, great link! I should probably use that data source and expand this. For what it's worth, the only two really interesting items in my list are the City of Detroit and Milwaukee. Those are the only two with enough data points to make the analysis meaningful. Even so, they still don't prove anything, but definitely raise questions.
The conclusion of the redditor you linked to is that all the data looks legit under a Benford's Law analysis. However this is a fallacy on his part, because he is mixing everything together and not restricting it to the corrupt big cities. If you mix lots of good data with bad, of course you will attenuate the signal of fraud.
attenuate the signal of fraud
Very interesting, yes that makes sense. His analysis would only reveal manipulation of those high-level numbers, and for sure, pockets of fraud would go undetected.
I'm looking into this one which found a positive result. This apparently based on Milwaukee data.
So I adapted that code to examine the 2nd digit, which should also follow Benford's Law. However, it makes all the data look "manipulated". (!) So I think I probably didn't apply the Law correctly. Would you happen to know what a 2nd digit analysis would look like?
Live Notebook: https://www.kaggle.com/dogweather/allegheny-cty-benford-s
Awesome project. Here's similar work, and the data source. Might be really useful: