PSLmodels / taxdata

The TaxData project prepares microdata for use with the Tax-Calculator microsimulation project.
http://pslmodels.github.io/taxdata/
Other
20 stars 30 forks source link

Itemizers Taking the Medical Expense Deduction-- Discrepancy with IRS SOI Table #147

Open derrickchoe opened 6 years ago

derrickchoe commented 6 years ago

Hello all,

I'm currently working on a project involving the medical expense deduction, and I'm having trouble matching the number of itemizers taking this deduction as published by the IRS.

As an example, the IRS SOI table below estimates that ~9 million tax filers took this deduction in 2013. 13in21id.xlsx

When I run the following code, I get an estimated ~7.6 million tax filers taking this deduction (please let me know if the issue stems from the way I'm counting these tax filers).

import taxcalc as tc
import numpy as np

calc = tc.Calculator(records = tc.Records(), policy = tc.Policy())
calc.calc_all()
np.where((calc.array('c04470') > 0) & (calc.array('c17000') > 0), calc.array('s006'), 0).sum()

When advancing to subsequent years, I continue getting numbers in the range of 7.1-7.2 million, whereas the IRS estimates stay in the 8.6-9 million range.

I am wondering if this issue arises because of the relatively small number of people taking this deduction. Is it possible that the survey weights used in tax-calc, while effective at matching broader US population characteristics, are not entirely appropriate for estimating the size of this small population of deduction-takers?

I'm eager to hear your thoughts-- any possible solutions/comments are greatly appreciated. Thanks for your time.

andersonfrailey commented 6 years ago

Are you using a version of Tax-Calculator that uses the TCJA policies as a baseline or 2017 policy? The difference of ~2 million could be due to the new standard deduction levels making it better for those tax units to take the standard deduction rather than itemize now. They're medical deductions are still being calculated when we determine how much they would receive from itemizing, so while c04470 might ultimately be zero, c17000 would be above zero.

When I drop the calc.array('c04470') > 0 condition from your np.where statement I get ~9.7 million filers reporting medical expenses deducted.

derrickchoe commented 6 years ago

Hi @andersonfrailey,

I'm using current law policy, but I assumed that extrapolating the data to year 2013 would avoid any issues with differences between 2017 law and the TCJA. I know that the IRS data only includes people who end up itemizing in year 2013, so that's why I included the calc.array('c04470') > 0 condition.

@MattHJensen also made a good point-- he wanted to confirm that the IRS data only include data from line 4 from Schedule A (so we exclude people who only enter their medical expenses but aren't eligible for the deduction, I think). After poking around the IRS documentation, it seems to me that the only observations reported are those who actually take the deduction and itemize.

Related link: https://www.irs.gov/pub/irs-soi/13insec4.pdf

Thanks for your help looking into this-- let me know if I missed something in particular!

andersonfrailey commented 6 years ago

I know that the IRS data only includes people who end up itemizing in year 2013

This isn't necessarily the case. The PUF file we're using is from 2009 so reported itemized deductions are only included for those who itemized in 2009. It's possible that when Tax-Calculator runs the number for 2013 some no longer itemize.

derrickchoe commented 6 years ago

@andersonfrailey

In response to

I know that the IRS data only includes people who end up itemizing in year 2013

Sorry, that last statement was ambiguous. I should have clarified-- by IRS data in this case I mean the SOI table that I'm trying to match. So the IRS SOI table only reports the number of people who have enough qualified medical expenses to take the deduction, and end up itemizing their deductions.

Am I correct in only counting those who meet the conditions c04470 > 0 and c17000 > 0 when trying to match this population?

Thanks for bring this up; hopefully my explanation above better explains my reasoning for limiting the population to those we calculate as itemizing their deductions in 2013 (who also take the medical expense deduction).