PSLmodels / tax-microdata-benchmarking

A project to develop a benchmarked general-purpose dataset for tax reform impact analysis.
https://pslmodels.github.io/tax-microdata-benchmarking/
2 stars 6 forks source link

Add chisquare_test.py to assess area weights generated with different targets #267

Closed martinholmer closed 1 month ago

martinholmer commented 1 month ago

Using the default number of income tax bins (200), the chi-square test results show that, for each of the ten areas, adding 27 targets (nine counts for all-filing-status by AGI category, nine wage income dollar targets by AGI category, and nine business income dollar targets by AGI category) does not generate a distribution of area weights that is significantly different from the weights distribution by income tax category generated without those extra 27 targets.

Removing the redundant 27 targets reduces the create_area_weights.py execution times by large amounts. The execution time speedup is typically in the range of 4X to 6X.

So, assessing whether or not added targets generate different weights distributions is not only good statistical practice, it can lead to much faster generation of area weights (which is important given that roughly five hundred areas are being processed).