Imputing values for the PT_binc_w2_wages variable

martinholmer commented 6 months ago

As discussed in our recent conference call, we need to add values for the Tax-Calculator PT_binc_w2_wages input variable to each Phase 2 data file that we create. I have tried a simple imputation approach using the 2011 PUF data file produced in the taxdata repository. I will describe what I've done and then suggest a way to do the imputation as part of the mainstream data file creation process.

Following the suggestion made by @nikhilwoodruff, I assumed the value of the imputed variable is a simple linear function of qualified business income (qbi):

w2wages = scale * max(0, qbi)

where w2wages is PT_binc_w2_wages and scale is a calibrated real number and

               qbi = max(0, e00900 + e26270 + e02100 + e27200)

Using this imputation formula, I found setting scale to 0.27 produced an augmented puf.csv file that generated a 2023 QBID tax expenditure of $55.7 billion, which is close to the JCT estimate of $56.2 billion.

It seems to me this simple imputation procedure could be easily added to the Phase 2 data generation process being developed in this repository. A constant (across years) value of scale would used to generate imputed values in each year. That constant value of scale would be set to a value that generates (using Tax-Calculator to process the resulting 2021 file) aggregate qualified business income deduction (QBID) equal to the aggregate amount in the 2021 SOI tabulations ($205.8 billion). I have no idea whether or not the distribution (across AGI categories) of QBID will be anything like what the 2021 SOI tabulation shows. It it is way off, then we will eventually need to experiment with some nonlinear imputation functions. But I suggest we put that off until Phase 3.

donboyd5 commented 6 months ago

Thank you I think this makes sense.

nikhilwoodruff commented 6 months ago

Just a note- after reweighting I get the final 2021 parameter to be 35.7%

(base) nikhil@192 tax-microdata-benchmarking % python tax_microdata_benchmarking/adjust_qbi.py 
Creating CPS flat file for 2021
Creating PUF flat file for 2021
Adding Tax-Calculator outputs to the flat file for 2021
Reweighting the flat file for 2021
100%|█████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:35<00:00, 281.86it/s]
Adding pass-through W2 wages to the flat file for 2021
scale: 0.0, deviation: -119.95162997484529, total: 95.9639632454937
scale: 2.0, deviation: 17.740212985602568, total: 233.65580620594156
scale: 1.0, deviation: 15.213033483977682, total: 231.12862670431667
scale: 0.5, deviation: 12.285153087418735, total: 228.20074630775773
scale: 0.25, deviation: -34.98332830899048, total: 180.9322649113485
scale: 0.375, deviation: 4.900314132936614, total: 220.8159073532756
scale: 0.3125, deviation: -14.195999745480918, total: 201.71959347485807
scale: 0.34375, deviation: -4.246802492497579, total: 211.6687907278414
scale: 0.359375, deviation: 0.4545361736741711, total: 216.37012939401316
scale: 0.3515625, deviation: -1.869013698429228, total: 214.04657952190976
scale: 0.35546875, deviation: -0.7001493964534973, total: 215.2154438238855
scale: 0.357421875, deviation: -0.12094591835099777, total: 215.794647301988
Final scale: 35.7%

martinholmer commented 6 months ago

Recent code revisions have incorporated this variable imputation.

PSLmodels / tax-microdata-benchmarking

Imputing values for the PT_binc_w2_wages variable #32