Getting the QBI deduction right

donboyd5 commented 4 months ago

@nikhilwoodruff @martinholmer

We've been trying to make sure our tmd (tax-microdata-benchmarking) file has the best possible information for the qualified business income deduction. From the IRS, we know that total QBI deduction (qbided) was $205.8 billion in 2021 and we've been targeting that. We also know # of returns with qbided and the distribution by AGI range of qbided and number, but have not been targeting them. We've been targeting the qbided total by estimating pass-through W2 wages -- the greater the wages, the greater the possible qbided, up to certain other limits. We've been assuming a simple proportionate relationship between QBI (i.e., the income) and pass-through wages. This does not consider possible more-complex QBI-W2wage relationships, and it does not consider other factors (rules) that can limit qbided.

Using this approach we have been able to solve for a simple pass-through wages proportionate relationship that hits the IRS qbided $205.8 billion with weights that are simply grown from 2015 to 2021 to take into account growth in the number of returns (s006_original in the table below), but that do not reflect reweighting to hit targets.

However, when we reweight (s006 weights in the table below), we have found we cannot hit the $205.8 billion target and so, in the near term, used a simple .33 relationship between QBI and W2 wages (which does not hit the target).

I've looked into this, varying the qbid-related levers that, via Tax-Caculator, affect qbided limits and therefore qbided. The table below summarizes the results. I am racing the clock this morning and can come back to answer questions and edit. The code and table in the next comment should make it clear, for those who want to read that comment. Here's a quick summary;

The main levers are:

pass-through W2 wages (PT_binc_w2_wages), which affect a wage-related qbided limit and the wage-and-property-related qbided limit; the higher the value, the higher the possible qbided, unless other limits are hit; in our baseline tmd, we assume they are 33% of QBI
pass-through property (PT_ubia_property), which affects the wage-and-property-related qbided limit; the higher the value, the higher the possible qbided unless other limits are hit; we assume this is 0 for all filers
whether the QBI is from a Specified Service Trade or Business (SSTB), in which case the qbided is phased out above certain thresholds, we assume this false for all filers, and so the qbided is not phased out

The table below shows the results of varying these levers and the weights used.

Here are my key takeaways:

With baseline assumptions above, and original weights, we can almost hit the IRS target (we have $202 billion) and we know from prior analyses that we had been able to hit it with a different proportion than 33% (and with earlier iterations of the data).
We are far from it with the "optimized" weights ($154.9 billion)
We can come close with "optimized" weights if we set w2 wages or ubia property to large values but still can't hit it
We can go above it with original weights
This suggests to me (1) our reweighting is a problem, and we've discussed how to address that - namely penalizing large deviations from original weights and need to do that soon, and (2) most taxpayers (I think) probably are not facing significant limits; we probably should raise the assumed w2 wages and ubia property.

More later but must run.

Code and table of results are in the next section.

donboyd5 commented 4 months ago

Code and results related to discussion above:

import os
import pathlib
import pandas as pd
import numpy as np
import copy
from taxcalc import GrowFactors, Policy, Records, Calculator

TMDDIR = '~/Documents/python_projects/tax-microdata-benchmarking/tax_microdata_benchmarking/storage/output/'
TMDDIR = os.path.expanduser(TMDDIR)
TMDDIR

tmd_fname = TMDDIR + 'tmd.csv.gz' 
weights_fname = TMDDIR + 'tmd_weights.csv.gz'
gfactors_fname = TMDDIR + 'tmd_growfactors.csv'

# get the tmd data and create a records object set to 2021 tax year
gfactors = GrowFactors(growfactors_filename=gfactors_fname)

tmd = pd.read_csv(tmd_fname)
recs1 = Records(data=tmd, start_year=2021, gfactors=gfactors, weights=weights_fname)
pol = Policy()

def getcalc(recs, label, w2=None, ubia=None, sstb=None):
    recs2 = copy.deepcopy(recs) 
    if w2 is not None:
        recs2.PT_binc_w2_wages = np.full_like(recs2.PT_binc_w2_wages, w2, dtype=np.int32)
    if ubia is not None:
        recs2.PT_ubia_property = np.full_like(recs2.PT_ubia_property, ubia, dtype=np.int32)
    if sstb is not None:
        recs2.PT_SSTB_income = np.full_like(recs2.PT_SSTB_income, sstb, dtype=np.int32)
    calc = Calculator(policy=pol, records=recs2)
    calc.calc_all()
    vars = ['RECID', 'data_source', 's006', 's006_original', 'c00100', 'e00900', 'e26270', 'e02100', 'e27200', 'PT_SSTB_income', 'PT_binc_w2_wages', 'PT_ubia_property', 'qbided']
    calcdf = calc.dataframe(variable_list=vars)
    calcdf['label'] = label
    return calcdf

bigint = 10_000_000_000

stack = pd.concat([getcalc(recs=recs1, label="_tmdbase"),
                  getcalc(recs=recs1, label="allmax", w2=bigint, ubia=bigint, sstb=0),
                  getcalc(recs=recs1, label="w2max", w2=bigint),
                  getcalc(recs=recs1, label="ubiamax", ubia=bigint),
                  getcalc(recs=recs1, label="sstbmin", sstb=1)],
                  axis=0, ignore_index=True)
# stack.columns
stack['qbi'] = np.maximum(0, stack.e00900 + stack.e26270 + stack.e02100 + stack.e27200)
stack['nqreturns'] = (stack.qbided > 0.0) * 1.0
# stack.head()

idvars = ['label', 'data_source']
wtvars = ['s006', 's006_original']
datavars = ['qbi', 'PT_binc_w2_wages', 'PT_ubia_property', 'qbided', 'nqreturns']
keepvars = idvars + wtvars + datavars
# stack[keepvars]
long = pd.melt(stack[keepvars], id_vars=idvars + datavars, value_vars=wtvars, var_name='wtname', value_name='weight')
long[datavars] = long[datavars].multiply(long['weight'], axis=0)
# long.head()
result = long.groupby(['data_source', 'label', 'wtname'])[datavars].sum()
result = result.reset_index()
div1e9 = [item for item in datavars if item != 'nqreturns']
result[div1e9] = result[div1e9] / 1e9
result['nqreturns'] = result['nqreturns'] / 1e6
result = result.sort_values(by=['data_source', 'wtname', 'label'], ascending=[False, True, True])

format_dict = {col: '{:,.2f}' for col in div1e9}
format_dict['nqreturns'] = '{:.3f}'

result.style.format(format_dict)

martinholmer commented 4 months ago

@donboyd5, Thanks for your observations at the bottom of this issue 125 comment.

I don't disagree with your observations on QBID, but I do wonder how much more effort we should put into this issue in Phase 3, which is almost over. As I pointed out in PR #124, the 2023 tax-expenditure estimate for QBID is reasonably close to the JCT/CBO estimates when we use the current tmd.csv input file with Tax-Calculator (even though we are below the $205.8 billion target for 2021). And as you can see in the recently updated examination results, our crude imputation of PT_binc_w2_wages produces a big improvement over the old taxdata puf.csv file:

It seems to me the more important Phase 3 issues highlighted by the examination results are:

Why has the CTC tax expenditure estimate from tmd.csv dropped to such a low amount?
Why is the SALT tax expenditure estimate so low?

cc @nikhilwoodruff

donboyd5 commented 4 months ago

@martinholmer @nikhilwoodruff

@martinholmer, I agree with your conclusion that the CTC and SALT tax expenditures are more important to worry about. However, the problems come, broadly speaking, from similar issues, namely (1) incorrect targeting, and (2) inappropriate weighting. Since the fix for qbided appears to be easy, let me elaborate on that here, quickly comment on CTC and SALT, and open separate issues for them.

Regarding qbided, we can see from the above that qbi drops from $1.311 trillion before reweighting (with s006_original), to $1.037 trillion after reweighting (with s006) -- a drop of $273 billion, or about 21%. Since one of the many limits in the qbided calculation is that it cannot exceed 20% of qbi, it should be no surprise that if qbi is about $1 trillion (with s006) and the IRS is reporting $205.8 billion for the deduction (a bit more than 20%), it ought to be very hard for us to hit. This is a pretty good indication that our qbi is too low.

In our last call @nikhilwoodruff said that we were targeting the main input variables that go into qbi (e00900 and e26270), with positive and negative totals targeted separately. A review of tmd's reweight.py appears to confirm this. So this suggests we are hitting the targets with the "optimized" weights. Here is a table showing the s006 (optimized) and s006_original (pre-optimization) weighted values for qbi components and related variables, for data_source==1 records, amounts in $ billions. Interpretation after the table:

Let's focus on e00900 and e26270. As discussed here, for e00900 we need to either separate positive and negative tmd values and target separately, or combine the positive and negative IRS targets and target that. I believe @nikhilwoodruff did the former, which is preferable. For our purpose now, we look at the sum. Here's what the relevant IRS table (21in14ar.xls) shows:

The desired combined value is $517.1 billion - $105.6 billion, or about $411.5 billion. The reweighted value of $422.1 billion is quite consistent with this, and much better than the original-weighted value of $558.8 billion. That's comforting.

Moving on to e26270, the same doc noted that we need to combine positive and negative values of partnership income and ALSO of S corporation income. The screenshot below shows the calculation:

The total e26270 amount should be $975.7 billion. However, the amount we have in the reweighted tmd file is only $302 billion. Eyeballing the screenshot above, you can see that the partnership net is about $301.6 billion and the S corp net is about $674 billion. Since we're only hitting $302 billion, this suggests we have the wrong target and likely we only included the partnership net in the target and not the S corporation net -- @nikhilwoodruff, can you check?

I suspect that's the problem, and when we fix it we should not have any problem hitting the qbided amount with our W2-based method, although I think we've learned that may be less than ideal in the long run and we should revisit this if there is work beyond Phase 3.

donboyd5 commented 4 months ago

Recapping to this point:

Our QBI is too low
Almost certainly because our e26270 amount is too low
Almost certainly because the target for e26270 only includes the partnership component and not the S corporation component
We should be able fix this easily by correcting the target
Allowing us to move on to CTC and SALT
(But possibly returning to our W2 wages rule of thumb if there is work beyond Phase 3)

donboyd5 commented 4 months ago

Regarding SALT and CTC, I'll open separate issues, but I think we'll find that:

SALT has a similar targeting problem
CTC has a reweighting problem

PSLmodels / tax-microdata-benchmarking

Getting the QBI deduction right #125