PSLmodels / taxdata

The TaxData project prepares microdata for use with the Tax-Calculator microsimulation project.
http://pslmodels.github.io/taxdata/
Other
20 stars 30 forks source link

Why are total pension benefits less than taxable pension benefits in PUF data? #156

Closed martinholmer closed 6 years ago

martinholmer commented 6 years ago

Consider this tabulation of output from tc puf.csv 2014 --sqldb and tc cps.csv 2014 --sqldb.

$ cat tab-db.sql
select "count with e01500 >= e01700", count(*)
from dump
where e01500 >=- e01700;

select "count with e01500 < e01700", count(*)
from dump
where e01500 < e01700;

select "RECID|e01500|e01700";
select RECID, e01500, e01700
from dump
where e01500 < e01700
limit 9;

Tabulating CPS output with this SQL script generates these results:

$ cat tab-db.sql | sqlite3 cps-14-#-#-#.db
count with e01500 >= e01700|456465
count with e01500 < e01700|0
RECID|e01500|e01700

Which is what one would expect given the definitions of e01500 and e01700.

But the tabulation of the PUF output doesn't look right:

$ cat tab-db.sql | sqlite3 puf-14-#-#-#.db
count with e01500 >= e01700|249087
count with e01500 < e01700|1151
RECID|e01500|e01700
4002444|0.0|1305.75
4002451|0.0|4439.54
4002454|0.0|1.09
4002459|0.0|1619.13
4002437|0.0|19063.91
4002436|0.0|11425.29
4002421|0.0|6528.74
4002410|0.0|15668.97
4002427|0.0|12012.87

@andersonfrailey, why do we have over one thousand PUF filing units for whom total pensions are less than taxable pensions? These funny-looking cases are not present in the CPS data.

andersonfrailey commented 6 years ago

Thanks for pointing this out @martinholmer. I dug a little deeper and all of the filing units where this is an issue are non-filers, which means they come from the CPS, not the PUF.

When we add non-filers to puf.csv, we set income variables that weren't found when creating the CPS tax units to zero. In this case, e01500 is set to zero while e01700 is set to the value of pensions found in the CPS. You can see this in these lines of code in add_nonfilers.py. This goes back to when we were producing the PUF using SAS scripts.

This isn't an issue in cps.csv because e01500 is set as the value of pensions in the CPS (rtm-val) and e01700 is derived from that, as can be seen in cps_data/finalprep.py.

Given that the variables used to determine e01500 in the CPS is described as

Recode total amount of retirement income received

in the documentation, I believe this is the correct way to define the two variables.

Now that you've brought this to my attention, the solution seems to be to modify the matching scripts so that e01500 is equal to the value of pensions reported in the CPS and e01700 is derived from that. Because non-filers are not used in the stage 2 process, this does not require also creating a new weights file.

To prevent further issues like this, I'll be working on a test suite for the TaxData repo that includes logical checks like the one you conducted after we finish the UBI project.

martinholmer commented 6 years ago

@andersonfrailey , Thanks for the analysis in #156 What’s the timing on the new puf.csv file?
We need one to fix Tax-Calculator’s expanded income statistic.

andersonfrailey commented 6 years ago

@martinholmer I'll try and get you something today.

martinholmer commented 6 years ago

@andersonfrailey , Thanks but no rush. I won’t be back in the office until Thursday.

martinholmer commented 6 years ago

@andersonfrailey, Where's the pull request that made the new puf.csv you sent to me? When using that new file, I get changes in AGI, which is not what I would expect if the new file simply fixed the e01500 values (which are not in AGI).

andersonfrailey commented 6 years ago

@martinholmer I also changed e01700. Based on what we've done in the CPS file and my reading of CPS documentation, the pensions variable we use to set e01700 currently should actually be e01500 and e01700 should be a subset of that. So the changes in AGI can be attributed to that change. PR #158 shows the changes I've made.

martinholmer commented 6 years ago

Issue #156 has been resolved by pull request #158.