Adding benefits to the PUF - Initial Results

andersonfrailey commented 5 years ago

I've begun working on adding C-TAM calculated benefits to the PUF and conducting some very preliminary analysis. Initial results do not look promising when compared to the CPS. For each benefit neither participation rates or average benefits by AGI percentile aligned well with what we see in the CPS. I've posted some simple plots below. I'm going to go over my code to see if I made any mistakes there and also look at the raw CPS file before it's matched to ensure that the benefits were merged on correctly. All of the plots below are after tax-calculator extrapolated the data to 2018.

Housing

WIC

SNAP

SSI

All Benefits

This shouldn't be taken seriously yet because there are still a number of benefits that haven't been added to the PUF

cc @MattHJensen @Amy-Xu

MaxGhenis commented 5 years ago

I've linked this in the stochastic imputation doc, could you share your methodology and code for me to add?

I'm curious how a random forests imputation would fare, as it did well predicting e00900 in the CPS file.

andersonfrailey commented 5 years ago

@MaxGhenis for this @Amy-Xu modified C-TAM to work with the 2016 CPS then I merged the benefits on before the statistical matching process. I've been working on this branch of TaxData if you want to look at the changes I've made.

The goal of this is to see how simply augmenting benefits onto the CPS holds up after statistical matching. I haven't given any thought to using more advanced techniques at this point in time but I'd love any insights you have.

martinholmer commented 5 years ago

@andersonfrailey said to @MaxGhenis in taxdata issue #293 (entitled "Adding benefits to the PUF"):

I've been working on this branch of TaxData if you want to look at the changes I've made.

The goal of this is to see how simply augmenting benefits onto the CPS holds up after statistical matching.

I looked at your branch and I'm thoroughly confused. The title of this issue is "Adding benefits to the PUF", but then you said in the comment quoted above that your work is focusing on "augmenting benefits onto the CPS". And when I look at the code you've added to your branch, I see you've added a function called puf_data/StatMatch/Matching/merge_benefits.py that has this docstring:

Can you explain exactly what you're doing in your branch? Otherwise, we have no idea what to make of the graphical comparisons presented in an earlier comment in this issue.

andersonfrailey commented 5 years ago

Here the C-TAM benefits are being merged onto the 2016 CPS file that is matched with the IRS-SOI PUF. My title was a bit unclear. By "Adding benefits to the PUF," I was referring to puf.csv, not the IRS-SOI PUF. What I tried on that branch was using C-TAM to impute benefits for the 2016 CPS and use the 2016 CPS augmented with those benefits in the statistical match with the IRS-SOI PUF.

martinholmer commented 5 years ago

@andersonfrailey said in #293:

Here the C-TAM benefits are being merged onto the 2016 CPS file that is matched with the IRS-SOI PUF. My title was a bit unclear. By "Adding benefits to the PUF," I was referring to puf.csv, not the IRS-SOI PUF. What I tried on that branch was using C-TAM to impute benefits for the 2016 CPS and use the 2016 CPS augmented with those benefits in the statistical match with the IRS-SOI PUF.

Thanks for the additional detail, but I may still be confused. Are you saying that @Amy-Xu has done a new imputation of benefits to the 2016 CPS that differs from the imputed benefits in the current cps.csv.gz file? It sounds as if your saying yes: the 2016 CPS benefits are different. Then you used those 2016 CPS benefits to assign benefits in the puf.csv because each puf.csv filing units has an associated 2016 CPS record that it has been assigned, right?

So, then what are the graphical results you showed here? I understand what the PUF lines mean, but which of the two CPS benefits are used to plot the CPS lines in the graphs? Maybe the graphs are disappointing because the new 2016 imputed benefits are very different from the benefits imputed in the current cps.csv.gz file. Has anybody examined that possibility?

MaxGhenis commented 4 years ago

@MattHJensen pointed to Piketty Saez Zucman (2018) using a very simple binning method for imputing benefits, basically taking the probability of participation and average value within each of 40 bins (income decile x marital status x 65+), then scaling each up to hit administrative totals (section B.3). I've added it to the http://bit.ly/stochastic-imputation document.

PSLmodels / taxdata