Closed donboyd5 closed 4 months ago
Code and results related to discussion above:
import os
import pathlib
import pandas as pd
import numpy as np
import copy
from taxcalc import GrowFactors, Policy, Records, Calculator
TMDDIR = '~/Documents/python_projects/tax-microdata-benchmarking/tax_microdata_benchmarking/storage/output/'
TMDDIR = os.path.expanduser(TMDDIR)
TMDDIR
tmd_fname = TMDDIR + 'tmd.csv.gz'
weights_fname = TMDDIR + 'tmd_weights.csv.gz'
gfactors_fname = TMDDIR + 'tmd_growfactors.csv'
# get the tmd data and create a records object set to 2021 tax year
gfactors = GrowFactors(growfactors_filename=gfactors_fname)
tmd = pd.read_csv(tmd_fname)
recs1 = Records(data=tmd, start_year=2021, gfactors=gfactors, weights=weights_fname)
pol = Policy()
def getcalc(recs, label, w2=None, ubia=None, sstb=None):
recs2 = copy.deepcopy(recs)
if w2 is not None:
recs2.PT_binc_w2_wages = np.full_like(recs2.PT_binc_w2_wages, w2, dtype=np.int32)
if ubia is not None:
recs2.PT_ubia_property = np.full_like(recs2.PT_ubia_property, ubia, dtype=np.int32)
if sstb is not None:
recs2.PT_SSTB_income = np.full_like(recs2.PT_SSTB_income, sstb, dtype=np.int32)
calc = Calculator(policy=pol, records=recs2)
calc.calc_all()
vars = ['RECID', 'data_source', 's006', 's006_original', 'c00100', 'e00900', 'e26270', 'e02100', 'e27200', 'PT_SSTB_income', 'PT_binc_w2_wages', 'PT_ubia_property', 'qbided']
calcdf = calc.dataframe(variable_list=vars)
calcdf['label'] = label
return calcdf
bigint = 10_000_000_000
stack = pd.concat([getcalc(recs=recs1, label="_tmdbase"),
getcalc(recs=recs1, label="allmax", w2=bigint, ubia=bigint, sstb=0),
getcalc(recs=recs1, label="w2max", w2=bigint),
getcalc(recs=recs1, label="ubiamax", ubia=bigint),
getcalc(recs=recs1, label="sstbmin", sstb=1)],
axis=0, ignore_index=True)
# stack.columns
stack['qbi'] = np.maximum(0, stack.e00900 + stack.e26270 + stack.e02100 + stack.e27200)
stack['nqreturns'] = (stack.qbided > 0.0) * 1.0
# stack.head()
idvars = ['label', 'data_source']
wtvars = ['s006', 's006_original']
datavars = ['qbi', 'PT_binc_w2_wages', 'PT_ubia_property', 'qbided', 'nqreturns']
keepvars = idvars + wtvars + datavars
# stack[keepvars]
long = pd.melt(stack[keepvars], id_vars=idvars + datavars, value_vars=wtvars, var_name='wtname', value_name='weight')
long[datavars] = long[datavars].multiply(long['weight'], axis=0)
# long.head()
result = long.groupby(['data_source', 'label', 'wtname'])[datavars].sum()
result = result.reset_index()
div1e9 = [item for item in datavars if item != 'nqreturns']
result[div1e9] = result[div1e9] / 1e9
result['nqreturns'] = result['nqreturns'] / 1e6
result = result.sort_values(by=['data_source', 'wtname', 'label'], ascending=[False, True, True])
format_dict = {col: '{:,.2f}' for col in div1e9}
format_dict['nqreturns'] = '{:.3f}'
result.style.format(format_dict)
@donboyd5, Thanks for your observations at the bottom of this issue 125 comment.
I don't disagree with your observations on QBID, but I do wonder how much more effort we should put into this issue in Phase 3, which is almost over. As I pointed out in PR #124, the 2023 tax-expenditure estimate for QBID is reasonably close to the JCT/CBO estimates when we use the current tmd.csv
input file with Tax-Calculator (even though we are below the $205.8 billion target for 2021). And as you can see in the recently updated examination results, our crude imputation of PT_binc_w2_wages
produces a big improvement over the old taxdata puf.csv
file:
It seems to me the more important Phase 3 issues highlighted by the examination results are:
tmd.csv
dropped to such a low amount?cc @nikhilwoodruff
@martinholmer @nikhilwoodruff
@martinholmer, I agree with your conclusion that the CTC and SALT tax expenditures are more important to worry about. However, the problems come, broadly speaking, from similar issues, namely (1) incorrect targeting, and (2) inappropriate weighting. Since the fix for qbided appears to be easy, let me elaborate on that here, quickly comment on CTC and SALT, and open separate issues for them.
Regarding qbided, we can see from the above that qbi drops from $1.311 trillion before reweighting (with s006_original), to $1.037 trillion after reweighting (with s006) -- a drop of $273 billion, or about 21%. Since one of the many limits in the qbided calculation is that it cannot exceed 20% of qbi, it should be no surprise that if qbi is about $1 trillion (with s006) and the IRS is reporting $205.8 billion for the deduction (a bit more than 20%), it ought to be very hard for us to hit. This is a pretty good indication that our qbi is too low.
In our last call @nikhilwoodruff said that we were targeting the main input variables that go into qbi (e00900 and e26270), with positive and negative totals targeted separately. A review of tmd's reweight.py appears to confirm this. So this suggests we are hitting the targets with the "optimized" weights. Here is a table showing the s006 (optimized) and s006_original (pre-optimization) weighted values for qbi components and related variables, for data_source==1 records, amounts in $ billions. Interpretation after the table:
Let's focus on e00900 and e26270. As discussed here, for e00900 we need to either separate positive and negative tmd values and target separately, or combine the positive and negative IRS targets and target that. I believe @nikhilwoodruff did the former, which is preferable. For our purpose now, we look at the sum. Here's what the relevant IRS table (21in14ar.xls) shows:
The desired combined value is $517.1 billion - $105.6 billion, or about $411.5 billion. The reweighted value of $422.1 billion is quite consistent with this, and much better than the original-weighted value of $558.8 billion. That's comforting.
Moving on to e26270, the same doc noted that we need to combine positive and negative values of partnership income and ALSO of S corporation income. The screenshot below shows the calculation:
The total e26270 amount should be $975.7 billion. However, the amount we have in the reweighted tmd file is only $302 billion. Eyeballing the screenshot above, you can see that the partnership net is about $301.6 billion and the S corp net is about $674 billion. Since we're only hitting $302 billion, this suggests we have the wrong target and likely we only included the partnership net in the target and not the S corporation net -- @nikhilwoodruff, can you check?
I suspect that's the problem, and when we fix it we should not have any problem hitting the qbided amount with our W2-based method, although I think we've learned that may be less than ideal in the long run and we should revisit this if there is work beyond Phase 3.
Recapping to this point:
Regarding SALT and CTC, I'll open separate issues, but I think we'll find that:
@nikhilwoodruff @martinholmer
We've been trying to make sure our tmd (tax-microdata-benchmarking) file has the best possible information for the qualified business income deduction. From the IRS, we know that total QBI deduction (qbided) was $205.8 billion in 2021 and we've been targeting that. We also know # of returns with qbided and the distribution by AGI range of qbided and number, but have not been targeting them. We've been targeting the qbided total by estimating pass-through W2 wages -- the greater the wages, the greater the possible qbided, up to certain other limits. We've been assuming a simple proportionate relationship between QBI (i.e., the income) and pass-through wages. This does not consider possible more-complex QBI-W2wage relationships, and it does not consider other factors (rules) that can limit qbided.
Using this approach we have been able to solve for a simple pass-through wages proportionate relationship that hits the IRS qbided $205.8 billion with weights that are simply grown from 2015 to 2021 to take into account growth in the number of returns (s006_original in the table below), but that do not reflect reweighting to hit targets.
However, when we reweight (s006 weights in the table below), we have found we cannot hit the $205.8 billion target and so, in the near term, used a simple .33 relationship between QBI and W2 wages (which does not hit the target).
I've looked into this, varying the qbid-related levers that, via Tax-Caculator, affect qbided limits and therefore qbided. The table below summarizes the results. I am racing the clock this morning and can come back to answer questions and edit. The code and table in the next comment should make it clear, for those who want to read that comment. Here's a quick summary;
The main levers are:
The table below shows the results of varying these levers and the weights used.
Here are my key takeaways:
More later but must run.
Code and table of results are in the next section.