PSLmodels / tax-microdata-benchmarking

A project to develop a benchmarked general-purpose dataset for tax reform impact analysis.
https://pslmodels.github.io/tax-microdata-benchmarking/
2 stars 6 forks source link

Data examination results for 2021 social security benefits #83

Closed martinholmer closed 5 months ago

martinholmer commented 5 months ago

Here are some aggregate 2021 statistics related to total social security benefits.

From the OASI benefit amounts and DI benefit amounts links at SSA OCACT benefits page, we have:

       OASI($B)  +   DI($B)  =  OASDI($B)
2021    993.167  +  139.996  =   1133.163

The $1133 billion total social security (OASDI) benefits paid during 2021 compares with the weighted sum of total social security benefits (e02400) from the tmd.csv file of about $1212.9 billion.

% awk -F, 'NR==1{next}{w=$11;t+=w;b+=w*$29}END{print t*1e-6,b*1e-9}' tmd.csv 
219.594 1212.9

So, the tmd.csv file has aggregate social security benefits that are about 7% larger than the benefits actually paid.

The IRS-SOI Publication 4801 tabulations of 2021 income tax returns has $791.161 billion in total social security benefits and $412.830 billion in taxable social security benefits.

Unfortunately, given the data problem described in issue #78, we cannot yet tabulate the tmd.csv file for the PUF-based subtotal of total social security benefits to compare with the $791.161 billion figure.

The 3.6.0 release of Tax-Calculator generates under current-law policy, the following tmd-21-#-#-#.csv dump output file:

% head tmd-21-#-#-#.csv 
FLPDYR,RECID,c02500,e02400,s006
2021,1,0.00,0.00,1431.89
2021,2,0.00,0.00,1431.89
2021,3,0.00,0.00,1872.70
2021,4,0.00,0.00,1873.35
2021,5,0.00,0.00,2120.81
2021,6,0.00,0.00,1431.89
2021,7,0.00,0.00,1431.89
2021,8,0.00,0.00,1431.89
2021,9,0.00,0.00,1431.89

% awk -F, 'NR>1{w=$5;b+=$4*w;t+=$3*w}END{print b*1e-9,t*1e-9}' tmd-21-#-#-#.csv  
1212.9 515.689

So, the current version of tmd.csv generates $515.689 billion in taxable social security benefits, which is about 25% above the IRS-SOI tabulated $412.830 billion. But again, without valid values for data_source in the tmd.csv file, we have no idea how much of the $512.689 billion is attributable to those who file income taxes, and therefore, are in the IRS-SOI PUF microdata file.

martinholmer commented 5 months ago

Now that the data_source variable is included in the tmd.csv file, we can calculate gross and taxable social security benefits for those with a data_source value of one to compare with the IRS-SOI Publication 4801 tabulations of 2021 income tax returns.

2021 social security benefits ($b) and total number of returns (#m):
                   IRS-SOI        tmd.csv
1040 returns       160.824        174.185
gross              791.161        888.394                      
taxable            412.830        511.620

So, we have too many returns, too many gross social security benefits, and too many taxable social security benefits.

martinholmer commented 5 months ago

Here are the details of the tabulations of social security benefits in issue #83:

(taxcalc-dev) Tax-Calculator% tc tmd.csv 2021 --tables --exact --reform ssben.json --sqldb --dvars ssben.dvars
You loaded data for 2021.
Tax-Calculator startup automatically extrapolated your data to 2021.

(taxcalc-dev) Tax-Calculator% ls -l tmd-21-*db
-rw-r--r--  1 mrh  staff  12496896 May 25 11:49 tmd-21-#-ssben-#.db

(taxcalc-dev) Tax-Calculator% echo ".schema" | sqlite3 tmd-21-#-ssben-#.db 
CREATE TABLE IF NOT EXISTS "baseline" (
  "s006" REAL,
  "RECID" INTEGER,
  "c02500" REAL,
  "FLPDYR" INTEGER,
  "data_source" INTEGER,
  "e02400" REAL
);
CREATE TABLE IF NOT EXISTS "reform" (
  "s006" REAL,
  "RECID" INTEGER,
  "c02500" REAL,
  "FLPDYR" INTEGER,
  "data_source" INTEGER,
  "e02400" REAL
);

(taxcalc-dev) Tax-Calculator% sqlite3 tmd-21-#-ssben-#.db < ssben.sql
---ALL DATA CLP ---
weights:
219.594
gross ssbens:
1212.904
taxable ssbens:
515.689
---DATA_SOURCE==1 CLP ---
weights:
174.185
gross ssbens:
888.394
taxable ssbens:
511.62
---DATA_SOURCE==1 T-E_REFORM ---
weights:
174.185
gross ssbens:
888.394
taxable ssbens:
888.394
---DATA_SOURCE==0 CLP ---
weights:
45.409
gross ssbens:
324.51
taxable ssbens:
4.069
---DATA_SOURCE==0 T-E_REFORM ---
weights:
45.409
gross ssbens:
324.51
taxable ssbens:
324.51
donboyd5 commented 5 months ago

Thanks for this, @martinholmer. I think there are a few issues we'll want to consider:

image

Anyway, there are a lot of questions to explore here. I'm happy to pitch in on the diagnostic analyses, @martinholmer but I don't want to be redundant but I'll hold off for now because there are some other analyses I can work on; we can catch up on Wednesday.

martinholmer commented 5 months ago

@donboyd5 said among other things in issue #83:

Another question is what to do about the very large difference between total SS benefits paid reported by the SSA and total SS benefits reported by tax filers on tax returns as reported by the IRS -- $1,133b vs. $791b. In theory, if we have a file that represents the total U.S. population, should we expect essentially all of the gap to be included in the nonfilers universe?

I don't see that there is anything to be done about that. The total gross social security benefits in the 2021 tmd.csv file is about $1,213 billion, which is only modestly above the the SSA administrative total of $1,133 billion. It is just that, as Dan pointed out early, many nonfilers are elderly people living on just social security benefits. But maybe I'm missing your point. Why exactly do you expect nonfilers to have little or no social security?

martinholmer commented 5 months ago

@donboyd5 said among other things in issue #83:

Getting back to Social Security, here's what IRS has for number of returns with total Social Security benefits in 2015 and 2021 -- 11.4% growth. This almost certainly is far faster than the 6-year returns growth we must have in tmd.csv weights - the population growth factor presumably was near 6% or a bit less.

What ever is going on in the Policyengine-US data creation and the TMD weights creation, it leaves us with more gross social security benefits in 2021 than SSA reports paying in 2021.

To me the biggest problem is that while we are getting too many gross social security benefits and we are getting way too many taxable social security benefits. How is the TMD repo handling the reweighting? Is it possible that high-income elderly filers are having their weights increased, and therefore, raising the taxable social security benefit total?

martinholmer commented 5 months ago

I have now added the s006_original variable to the tmd.csv data file and have added that variable to the Tax-Calculator records_variables.json file so that it can be included in tc dump output. The table below consolidates the results so far:

                     TMD_WEIGHTS   ORIGINAL_WEIGHTS   AGENCY_STATISTIC
2021 ALL UNITS:       
tax units (#m)        219.594       196.143             ------
gross ssben (#b)     1212.904      1098.870           1133.163 (SSA)
gross sscases (#m)     48.991        44.280
taxable ssben ($b)    515.689       443.511             ------
taxable sscases (#m)   29.978        25.272

2021 PUF UNITS:
tax units (#m)        174.185       150.828            160.824 (IRS)
gross ssben (#b)      888.394       774.580            791.161 (IRS)
gross sscases (#m)     32.684        27.982
taxable ssben ($b)    511.620       439.512            412.830 (IRS)
taxable sscases (#m)   27.451        22.754

2021 CPS UNITS:
tax units (#m)         45.409        45.315            ------
gross ssben (#b)      324.510       324.290            ------
gross sscases (#m)     16.307        16.298
taxable ssben ($b)      4.069         3.999            ------
taxable sscases (#m)    2.527         2.518
donboyd5 commented 5 months ago

@donboyd5 said among other things in issue #83:

Another question is what to do about the very large difference between total SS benefits paid reported by the SSA and total SS benefits reported by tax filers on tax returns as reported by the IRS -- $1,133b vs. $791b. In theory, if we have a file that represents the total U.S. population, should we expect essentially all of the gap to be included in the nonfilers universe?

I don't see that there is anything to be done about that. The total gross social security benefits in the 2021 tmd.csv file is about $1,213 billion, which is only modestly above the the SSA administrative total of $1,133 billion. It is just that, as Dan pointed out early, many nonfilers are elderly people living on just social security benefits. But maybe I'm missing your point. Why exactly do you expect nonfilers to have little or no social security?

Sorry, I didn't mean to imply that nonfilers would have little or no Social Security. I was trying to say that because they have a lot, it's important to examine the magnitude and distribution of nonfiler Social Security. Your note here says the magnitude is reasonably close. I think at some point, we also want to think about the distribution of nonfiler Social Security. Because we don't have IRS tables for that, I suppose it comes down to examining the distribution we have in comparison to CPS distribution. Certainly not a near term issue as we have bigger fish to fry.

donboyd5 commented 5 months ago

I have now added the s006_original variable to the tmd.csv data file and have added that variable to the Tax-Calculator records_variables.json file so that it can be included in tc dump output.

This is really helpful, thank you.

donboyd5 commented 5 months ago

@donboyd5 said among other things in issue #83:

Getting back to Social Security, here's what IRS has for number of returns with total Social Security benefits in 2015 and 2021 -- 11.4% growth. This almost certainly is far faster than the 6-year returns growth we must have in tmd.csv weights - the population growth factor presumably was near 6% or a bit less.

What ever is going on in the Policyengine-US data creation and the TMD weights creation, it leaves us with more gross social security benefits in 2021 than SSA reports paying in 2021.

To me the biggest problem is that while we are getting too many gross social security benefits and we are getting way too many taxable social security benefits. How is the TMD repo handling the reweighting? Is it possible that high-income elderly filers are having their weights increased, and therefore, raising the taxable social security benefit total?

Yes, agreed. I think this should be part of Wednesday's call. We should be able to break it down into three pieces:

  1. How well did the PUF do in 2015 representing IRS SS benefits and claimants in 2015?
  2. What was the impact of growfactors for returns and SS income on changes in data between 2015 and 2021?
  3. How did reweighting in 2021 change this 2021 result?

We have a little intelligence on this now, but not enough:

  1. PUF 2015 vs. IRS 2015 -- don't know yet but I can check
  2. Impact of growfactors from 2015-2021. As I read Tax-Calculator (taxcalc/records.py lines 273-372), ASOCSEC is used to adjust e02400 (Total social security (OASDI) benefits) on line 314. I believe this is the per-record adjustment. I do not see an adjustment for population growth (a factor to be applied to the weights) but I presume there must be one. (If there is, would you mind pointing me to where it is applied?) If I read taxdata properly, lines 48-49 of factors_finalprep.py adjust ASOCSEC by something called elderly_pop, but I don't see a similar adjustment in Tax-Calculator - it's clear I don't understand what's going on. Anyway, ASOCSEC in both the tmd and taxdata repos declines by 0.5% between 2015 and 2021 (tmd shown); this, I believe is just the per-record amount, not the total amount:

image

Now, here's what the IRS data tell us (see table in previous comment):

image

IRS shows about 17% growth in the per-return total and 11% in the number of returns, for total growth of about 31%. Comparing the ASOCSEC 0.5% decline to IRS 17%, it seems like we have way too little ASOCSEC growth. All else equal, this worsens our problem, of course, because in the end we have too much gross and way too much taxable SS benefits.

We can't yet compare this properly to our tmd data - we need to compare weighted tmd sum of e02400 in 2015 using original-weights to weighted sum in 2021, after growfactors (ASOCSEC) using original-weights for 2021. I can do that but not until tomorrow.

  1. We know from your table above, tmd weights raise e02400 weighted sum in 2021 from 774.580 to 888.394, or about 14.7%. The reweighting could affect both the # of SS-reporting filers and the per-return average. We don't know the breakdown yet, but can. In the end, reweighting may be the source of the problem but it would be nice to know the pieces. If reweighting is the problem, I suspect it's what you hypothesize - it may have up-weighted a lot of higher-income SS returns, which are likely to have relatively more taxable SS.

By targeting # returns and weighted-sum-AGI by AGI range, and not targeting total SS income (and other income totals), and by not telling the algorithm to penalize changes in weights, the algorithm will pick any old weight adjustments that hit AGI and # returns targets, quite possibly jerking around SS filers in the process. Anyway, we should be able to figure all of this out.

martinholmer commented 5 months ago

@donboyd5 said in the discussion of issue #83:

Impact of growfactors from 2015-2021. As I read Tax-Calculator (taxcalc/records.py lines 273-372), ASOCSEC is used to adjust e02400 (Total social security (OASDI) benefits) on line 314. I believe this is the per-record adjustment. I do not see an adjustment for population growth (a factor to be applied to the weights) but I presume there must be one. (If there is, would you mind pointing me to where it is applied?) If I read taxdata properly, lines 48-49 of factors_finalprep.py adjust ASOCSEC by something called elderly_pop, but I don't see a similar adjustment in Tax-Calculator - it's clear I don't understand what's going on. Anyway, ASOCSEC in both the tmd and taxdata repos declines by 0.5% between 2015 and 2021 (tmd shown); this, I believe is just the per-record amount, not the total amount:

Don, I don't think any of the Tax-Calculator growfactors are used in preparing the 2021 tmd.csv file. So, what you say above not relevant to this issue.

martinholmer commented 5 months ago

Closing issue #83 wrt social security benefits because a soon to be available version of the repository will produce different results, which will be included in a new issue.