PSLmodels / taxdata

The TaxData project prepares microdata for use with the Tax-Calculator microsimulation project.
http://pslmodels.github.io/taxdata/
Other
20 stars 30 forks source link

Separate CPS tax unit assignment from other calculations #357

Open MaxGhenis opened 4 years ago

MaxGhenis commented 4 years ago

Scripts like pycps.py and taxunit.py have functions that mix the grouping of individuals into tax units with other operations like income aggregation and C-TAM imputations. I think separating these operations would be cleaner and make it easier to improve each.

In this script (used for covid.ubicenter.org/fpuc and adapted from Ernie Tedeschi's script), I defined a tax_unit_id(ipum) function that produces a vector of tax unit IDs associated with each record in a person-level DataFrame from an IPUMS ASEC file. It's not as comprehensive as taxdata's, but the code is only about 15 lines and since it's vectorized, it runs essentially instantly, vs. several minutes in taxdata. I don't think there's anything in taxdata's tax unit creation process that precludes vectorization.

Once the tax unit ID is assigned, summing things to the tax unit level is a simple groupby.

More broadly, something like this would be ideal IMO:

  1. C-TAM imputes each benefit at whatever level it's provided in the ASEC (person/family/household), then be allocated to each person (it still references the O'Hare file, and seems to circularly require taxdata's CPS tax unit file?). This happens in C-TAM itself.
  2. Some not-yet-created package creates a master CPS person file, which includes C-TAM imputations and any other potential enhancement, potentially including earnings adjustments (#356), unemployment benefit adjustments from ui_calculator, imputations from other datasets like the SCF or CEX (but not PUF), etc. It would also have the capabilities to produce this from either an IPUMS extract, CPS .dat file, or CPS .csv file (as is required for the 2019 and likely 2020 ASECs).
  3. taxdata assigns tax unit IDs and filing statuses, aggregates into tax units, and imputes tax-unit-level fields from the PUF. The result is compatible with taxcalc.

If folks agree I can start a PR to move some of my code in here--it'd be a major change so might make sense to make a new feature branch if so. It'd be great to have some of this worked out by the 2020 ASEC release, which I believe is next month.

cc @ngpsu22 (UBI Center wants to start doing more poverty estimates with taxcalc, which hinges on a more flexible taxdata process)

andersonfrailey commented 4 years ago

I like these high level ideas, @MaxGhenis. Lemme know if I can help flesh them out more in any way.