Document statistical matching process

PSLmodels / taxdata

The TaxData project prepares microdata for use with the Tax-Calculator microsimulation project.

Other

21 stars 30 forks source link

I need to understand the current statistical matching process to benchmark synthimpute's age imputation (#333). The current code has very few comments and lacks documentation, and I'm having trouble following it.

It seems like the gist is that it first buckets records from the CPS and the PUF by a few variables [1], and then within each bucket matches records by predicted taxable income [2]?

[1] Matches on cells of idept (dependent) x ijs (?) x iagede (senior?) x idepne (dependent exemptions?) x people x ikids (bucketed) x iself (constant value of 9?)

[2] Regression LHS is continuous versions of [1] and some others income features

PSLmodels / taxdata

Document statistical matching process #358