datadesk / census-data-aggregator

Combine U.S. census data responsibly
MIT License
42 stars 9 forks source link

provide check that spatial aggregation doesn't induce spurious patterns #22

Open sastoudt opened 5 years ago

sastoudt commented 5 years ago

From this paper:

"one can induce geographic patterns in the aggregate data that do not exist in the input data"

Create a diagnostic to check for this (equations 2 and 3 in paper):

"The statistic S_j measures whether the region-level estimates for a given variable are within the margins of error of their constituent tracts. If a region-level estimate is within the margin of error of all its constituent tracts, then there is no information lost through aggregation; information loss increases as the 90 percent confidence intervals of more and more tract-level estimates do not overlap with the region’s estimate."

sastoudt commented 5 years ago

helper function here