PSLmodels / taxdata

The TaxData project prepares microdata for use with the Tax-Calculator microsimulation project.
http://pslmodels.github.io/taxdata/
Other
21 stars 30 forks source link

2015 PUF compatibility #437

Open andersonfrailey opened 1 year ago

andersonfrailey commented 1 year ago

This PR begins the work of making TaxData compatible with the 2015 PUF. There are three main parts:

  1. I modified the CPS tax unit creation code to allow for some flexibility in when C-TAM benefits are added to the CPS. Currently, any CPS from 2013, 2014, or 2015 will have C-TAM benefits imputed. This PR makes it easier to specify that you don't want C-TAM benefits imputed for those years.
  2. I add various aspects of the 2015 tax code specific to 2015: FICA maximum taxable earnings, and maximum pension deferral amount.
  3. In impute_pencon.py, I raise the maximum allowable wages from $30 million to $124 million. I'm assuming that $30 million was chosen because the highest reported wages in the 2011 PUF were ~$29 million, and not for a specific reason. In the 2015 PUF, the highest reported wages are ~$123 million.

I still haven't successfully created a working PUF. I'm running into an issue with pension contribution imputation. Not all of the wage/age group combinations for imputation have earners in the PUF. This is the error message I get when using the 2015 CPS as another donor file:

[~/taxdata/taxdata/puf/impute_pencon.py](https://file+.vscode-resource.vscode-cdn.net/Users/andersonfrailey/taxdata/~/taxdata/taxdata/puf/impute_pencon.py) in impute(idata, target_cnt, target_amt, year)
    172             if wgt_num_earners <= 0.0:
...
--> 174                 raise ValueError(msg.format(agrp, wgrp, wgt_num_earners))
    175             wgt_pos_pencon = target_cnt.iloc[wgrp, agrp]
    176             prob = wgt_pos_pencon [/](https://file+.vscode-resource.vscode-cdn.net/) wgt_num_earners

ValueError: agrp=7;wgrp=15 has wgt_num_earners=0.0000 <= 0

I'm going to try using different CPS years and see if that helps. Once I've done that I'll try creating weights, and modify createpuf.py to be more flexible about which CPS and PUF year are used.

cc @jdebacker

andersonfrailey commented 1 year ago

My plan for the imputation error above is to come up with an algorithm for creating imputation groups. I haven't worked out the exact details, but essentially I'll start with the smallest income groups available (the ones directly reported by the IRS) and then if there are no records in the PUF that fall into a certain cell, it'll combine that cell with one of the neighboring income cells (always staying within the same age group).

bristowrichards commented 1 year ago

I had been working on an implementation for the 2015 PUF and ran into similar issues with the high earners not falling into valid bins, and in making the upper limit arbitrarily large, the cells having zero earners. With empty cell/age groups, I was running into a similar wall, but wasn't sure if I should forge ahead with some troubleshooting (like the cell merging you described) because I wasn't sure if the 30mil income limit was special. If you are still working on this, I look forward to updates!

bristowrichards commented 1 year ago

I think you might have a typo in taxdata/puf/impute_pencon.py: the DC limit in 2015 should be 18000, not 1800, according to this PDF from the IRS.

gmfenton commented 6 months ago

Hi @andersonfrailey thanks for this update. I wanted to check in and see if (1) you or others have plans or a rough timeline on when TaxData may be compatible with the 2015 PUF or (2) if it may be helpful to have someone else work on resolving the problem you identified above with pension contribution imputations? Thanks!