PSLmodels / Tax-Calculator

USA Federal Individual Income and Payroll Tax Microsimulation Model
https://taxcalc.pslmodels.org
Other
254 stars 154 forks source link

Internet-TAXSIM validation when input variable 10 is positive #578

Closed GoFroggyRun closed 8 years ago

GoFroggyRun commented 8 years ago

I fed the tax-calculator and Internet TAXSIM with a subset of variables in 08 puf with slight manipulation, where the numextra was imputed separately (#577) based on the imputation we currently have. It seems like, for wild and complex tax units, the consistency between two models faded away. Please see the diff file for details:

TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]=  6   2842   2658      0.15 [286578]
      #big_vardiffs_with_big_inctax_diff=              184
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]=  7   6078     10     15.00 [203987]
      #big_vardiffs_with_big_inctax_diff=             6065
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]=  9   3846      0     12.40 [41]
      #big_vardiffs_with_big_inctax_diff=             2875
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 10     22     22     -0.01 [20034]
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 12     26     26     -0.01 [33351]
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 14      3      3      0.01 [186503]
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 15      3      3      0.01 [231388]
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 16   1663   1484      0.24 [377124]
      #big_vardiffs_with_big_inctax_diff=              179
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 17   1917   1675 -1036039.47 [123347]
      #big_vardiffs_with_big_inctax_diff=              235
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 18   2457   2232 1023639.47 [123347]
      #big_vardiffs_with_big_inctax_diff=              218
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 19   3827   3716 405361.23 [123347]
      #big_vardiffs_with_big_inctax_diff=              104
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 22      3      1   -585.10 [402552]
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 23      4      4      0.01 [204922]
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 24      6      1   -478.00 [175220]
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 25    406    406      0.01 [1378]
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 26   1599     22 -870500.00 [397215]
      #big_vardiffs_with_big_inctax_diff=               12
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 27   3132   1186 1318230.07 [368683]
      #big_vardiffs_with_big_inctax_diff=             1946
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]= 28  11530   8477 204727.89 [123347]
      #big_vardiffs_with_big_inctax_diff=             2798
TAXDIFF:ovar,#diffs,#1cdiffs,maxdiff[id]=  4  37463   4998 2943859.93 [296488]
                       #big_inctax_diffs=            32465

Out of 126,359 units, there are 32,465 with significant tax liability difference, where the difference should be up to 2943859.93. This will be investigated immediately.

cc @martinholmer @MattHJensen @feenberg

GoFroggyRun commented 8 years ago

OK. So for one particular record, the Internet Taxsim yields an output: (truncate the 29th variables and afterwards)

[TAXSIM] 2014 0 37951170.62 .00 175651.99 39.60 .00 3.80 100992000.00 .00 .00 6200.00 .00 3950.00 .00 .00 100985800.00 39947422.55 .00 .00 .00 .00 .00 .00 100992000.00 .00 37220278.64 

And the output from tax-calculator:

[PYTHON] 2014 0 40895030.55 0.00 175652.00 39.60 0.00 3.80 100992000.00 0.00 0.00 0.00 0.00 3950.00 0.00 0.00 100985800.00 39947422.55 0.00 0.00 0.00 0.00 0.00 0.00 100992000.00 0.00 37220278.55

That unit has tax before credits amount of 37,220,278.64 in Taxsim, while we report the amount to be 37,220,278.55. I'm not sure why this is happening, but it looks like some rounding error.

The real issue is the difference from their iit, the difference between tax before credits and iit ought be the NIIT, investment income tax. According to the form, this unit should have NIIT of amount 3,674,752. To be specific,

3,674,752 = 0.038*(77,470,000 [Taxable INT] + 12,540,000 [Ordinary Dvid] + 6,694,000 [Capital Gain]),  

which coincides the difference of iit and tax before credits from the python calculator. Not sure what I'm missing here.

PS: Tax payer info:

Year MARS e00200 e00600 e00650 e00300 p22250 p23250
2014 1 4288000 12540000 12540000 77470000 5320000 1374000
martinholmer commented 8 years ago

Sean (@GoFroggyRun), Can you post the unedited Internet-TAXSIM input record for this case?

GoFroggyRun commented 8 years ago

@martinholmer Here you go:

296488. 2014 0 37951170.62 .00 175651.99 39.60 .00 3.80 100992000.00 .00 .00 6200.00 .00 3950.00 .00 .00 100985800.00 39947422.55 .00 .00 .00 .00 .00 .00 100992000.00 .00 37220278.64 175651.99 94298001.01 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
martinholmer commented 8 years ago

Martin said:

Sean (@GoFroggyRun), Can you post the unedited Internet-TAXSIM input record for this case?

Sean, Thanks for the output record, but I also would like to see the input record.

GoFroggyRun commented 8 years ago

@martinholmer The input assumes 8 variables as described in the table. I have just sent you the input record.

feenberg commented 8 years ago

If you want me to look at the taxsim calculation, you need to provide the input record.

dan

On Fri, 5 Feb 2016, Sean.Wang wrote:

@martinholmer Here you go:

  1. 2014 0 37951170.62 .00 175651.99 39.60 .00 3.80 100992000.00 .00 .00 6200.00 . 00 3950.00 .00 .00 100985800.00 39947422.55 .00 .00 .00 .00 .00 .00 100992000.00 .00 3 7220278.64 175651.99 94298001.01 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

— Reply to this email directly or view it on GitHub.[AHvQVXdW4Z9kZkuPEFckjgaULP_ZQ21Xks5phMt7gaJpZM4HTssI.gif]

GoFroggyRun commented 8 years ago

@feenberg Thanks for looking into this issue. I have just emailed you the input record in the TAXSIM format.

feenberg commented 8 years ago

And the concern is that taxsim seems not to include the Medicare Tax on Unearned Income anywhere? Is that right?

dan

On Fri, 5 Feb 2016, Sean.Wang wrote:

@feenberg Thanks for looking into this issue. I have just emailed you the input record in the TAXSIM format.

— Reply to this email directly or view it on GitHub.[AHvQVSRlfPN7uVOHMzhgTqMdbbxBfWI5ks5phON3gaJpZM4HTssI.gif]

GoFroggyRun commented 8 years ago

@feenberg I think the major issue in this case would be something involved with Net Investment Income tax, which is Form 8960.

MattHJensen commented 8 years ago

@feenberg and @GoFroggyRun, I believe the "Unearned Income Medicare Contribution" and the "Net Investment Income Tax" are synonymous.

feenberg commented 8 years ago

On Fri, 5 Feb 2016, Sean.Wang wrote:

@feenberg I think the major issue in this case would be something involved with Net Investment Income tax, which is Form 8960.

Is that a vague way of saying "The Net Investment Income Tax should be included in v4 (FIT) of taxsim but is not?

In general, when I get queries about taxsim I insist on three things.

1) The input data 2) The output data from Taxsim 3) What the user thinks the output data should have been.

Otherwise I have a lot of guessing to do.

dan

— Reply to this email directly or view it on GitHub.[AHvQVRCXboO0bQLSovrYfFh6vDBjW9Lvks5phQMVgaJpZM4HTssI.gif]

GoFroggyRun commented 8 years ago

@feenberg I expect the difference between v28, Federal Income Tax Before Credits, and v4, Federal income tax liability, to be the NIIT, which is the case for tax-calculator. For Taxsim, the difference between v28 and v4 is not zero, so I guess it's not that NIIT wasn't included.

Attached is the TAXSIM output you requested:

296488. 2014 0 37951170.62 .00 175651.99 39.60 .00 3.80 100992000.00 .00 .00 6200.00 .00 3950.00 .00 .00 100985800.00 39947422.55 .00 .00 .00 .00 .00 .00 100992000.00 .00 37220278.64 175651.99 94298001.01 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
feenberg commented 8 years ago

On Fri, 5 Feb 2016, Sean.Wang wrote:

@feenberg I expect the difference between v28, Federal Income Tax Before Credits, and v4, Federal income tax liability, to be the NIIT, which is the case for tax-calculator. For Taxsim, the difference between v28 and v4 is not zero, so I guess it's not that NIIT wasn't included.

Thank you. That is clear.

dan

Attached is the TAXSIM output you requested:

  1. 2014 0 37951170.62 .00 175651.99 39.60 .00 3.80 100992000.00 .00 .00 6200.00 . 00 3950.00 .00 .00 100985800.00 39947422.55 .00 .00 .00 .00 .00 .00 100992000.00 .00 3 7220278.64 175651.99 94298001.01 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

— Reply to this email directly or view it on GitHub.[AHvQVaoD9XCnqPStPD9Bv5tjObQ5kd6Sks5phQ-3gaJpZM4HTssI.gif]

martinholmer commented 8 years ago

Dan said:

On Fri, 5 Feb 2016, Sean.Wang wrote:

@feenberg I expect the difference between v28, Federal Income Tax Before Credits, and v4, Federal income tax liability, to be the NIIT, which is the case for tax-calculator. For TAXSIM, the difference between v28 and v4 is not zero, so I guess it's not that NIIT wasn't included.

Thank you. That is clear.

To make the situation even clearer, here is my understanding of the difference between Internet-TAXSIM and Tax-Calculator (via simtax.py) results for this very-high-income single individual (whose Internet-TAXSIM 22-variable input record we have all shared in private because Sean suggests that this person is in the confidential IRS PUF). In general, this person in 2014 has somewhat more than forty million dollars in earnings (ivar[7]), over ten million in qualified dividends (ivar[9]), over seventy million in other property income (ivar[10]), roughly five million in short-term capital gains (ivar[21]), and roughly one million in long-term capital gains (ivar[22]).

When feeding this input record into the Tax-Calculator via simtax.py, the total federal income tax liability is $2,943,859.93 more than what is generated by Internet-TAXSIM using the same input record. This difference, when divided by 0.038 equals is within pennies of equaling this person's other property income amount (the seventy some million).

So, it would seem that for this case the difference between the two models is that Tax-Calculator is including ivar[10], which simtax.py maps into e00300, interest income, in the base of the NIIT; while Internet-TAXSIM is not including ivar[10] in the base of the NIIT.

Can we all agree on this being the source of the tax liability difference?

If so, then is simtax.py doing a reasonable thing in mapping Internet-TAXSIM ivar[10] into e00300, interest income? If that is not a reasonable mapping, what makes more sense? If it is a reasonable mapping, why is Internet-TAXSIM apparently not including ivar[10] in the calculation of NIIT?

cc @feenberg @GoFroggyRun @MattHJensen

feenberg commented 8 years ago

On Sat, 6 Feb 2016, Martin Holmer wrote:

Dan said:

  On Fri, 5 Feb 2016, Sean.Wang wrote:

        @feenberg
        I expect the difference between v28, Federal Income Tax Before Credits, and v4, Federal income tax liability, to be the
        NIIT, which is the case for tax-calculator. For TAXSIM, the difference between v28 and v4 is not zero, so I guess it's not
        that NIIT wasn't included.

  Thank you. That is clear.

To make the situation even clearer, here is my understanding of the difference between Internet-TAXSIM and Tax-Calculator (via simtax.py) results for this very-high-income single individual (whose Internet-TAXSIM 22-variable input record we have all shard in private because Sean suggests that this person is in the confidential IRS PUF). In general, this person in 2014 has somewhat more than forty million dollars in earnings (ivar[7]), over ten million in qualified dividends (ivar[9]), over seventy million in other property income (ivar[10]), roughly five million in short-term capital gains (ivar[21]), and roughly one million in long-term capital gains (ivar[22]).

When feeding this input record into the Tax-Calculator via simtax.py, the total federal income tax liability is $2,943,859.93 more than what is generated by Internet-TAXSIM using the same input record. This difference, when divided by 0.038 equals is within pennies of equaling this person's other property income amount (the seventy some million).

So, it would seem that the difference for this case between the two models is that Tax-Calculator is including ivar[10], which simtax.py maps into e00300, interest income, in the base of the NIIT; while Internet-TAXSIM is not including ivar[10] in the base of the NIIT.

Can we all agree on this being the source of the tax liability difference?

If so, then is simtax.py doing a reasonable thing in mapping Internet-TAXSIM ivar[10] into e00300, interest income? If that is not a reasonable mapping, what makes more sense? If it is a reasonable mapping, why is Internet-TAXSIM apparently not including ivar[10] in the calculation of NIIT?

If this analysis is correct that would be a bug in taxsim as e00300 is interest income. I'll know more on Monday.

dan

cc @feenberg @GoFroggyRun @MattHJensen

— Reply to this email directly or view it on GitHub.[AHvQVSI0e7rqkAW9j7TxYo3h1_ZxhdyFks5phkB_gaJpZM4HTssI.gif]

martinholmer commented 8 years ago

Martin said:

So, it would seem that the difference for this case between the two models is that Tax-Calculator is including ivar[10], which simtax.py maps into e00300, interest income, in the base of the NIIT; while Internet-TAXSIM is not including ivar[10] in the base of the NIIT.

And then Dan said:

If this analysis is correct that would be a bug in Internet TAXSIM as e00300 is interest income. I'll know more on Monday.

But then I thought to myself: Why has this difference between the Tax-Calculator and Internet-TAXSIM not been found before as part of the earlier validation work?

After reviewing the code in the taxcalc/validation/make-in.tcl script that generates the random samples, I found the answer to my question. In all three types of samples (aYY.in, bYY.in, and cYY.in) the ivar[10], other property income, is set to zero for every member of the random sample. Why was that done? Probably because I was unsure at the beginning of the validation work about how TAXSIM would treat positive amounts of input variable 10 given this documentation:

10. Other property income, including
         interest
         unearned partnership and S-corp income
         rent
         alimony
         fellowships
         non-qualified dividends
         state income tax refunds (itemizers only)
         taxable IRA distributions
         capital gains distributions on form 1040
         other income or loss not otherwise enumerated here

Adjustments and items such as
         alimony paid
         Keogh and IRA contributions
         foreign income exclusion
         NOLs
can be entered here as negative income.(+/-)

This is quite a broad range of income types. Certainly many of these types should be included in the base of the NIIT (like interest and non-qualified dividends), but others seem inappropriate for the NIIT (like alimony and fellowships). Maybe it was once the case that all of these income types could be safely aggregated into a single amount, but with the recent introduction of the NIIT perhaps things are now a bit more complicated.

@feenberg @GoFroggyRun @MattHJensen

feenberg commented 8 years ago

On Sat, 6 Feb 2016, Martin Holmer wrote:

Martin said:

  So, it would seem that the difference for this case between the two models
  is that Tax-Calculator is including ivar[10], which simtax.py maps into
  e00300, interest income, in the base of the NIIT; while Internet-TAXSIM is
  not including ivar[10] in the base of the NIIT.

And then Dan said:

  If this analysis is correct that would be a bug in Internet TAXSIM as
  e00300 is interest income. I'll know more on Monday.

But then I thought to myself: Why has this difference between the Tax-Calculator and Internet-TAXSIM not been found before as part of the earlier validation work?

After reviewing the code in the taxcalc/validation/make-in.tcl script that generates the random samples, I found the answer to my question. In all three types of samples (aYY.in, bYY.in, and cYY.in) the ivar[10], other property income, is set to zero for every member of the random sample. Why was that done? Probably because I was unsure at the beginning of the validation work about how TAXSIM would treat positive amounts of input variable 10 given the documentation:

  1. Other property income, including interest unearned partnership and S-corp income rent alimony fellowships non-qualified dividends state income tax refunds (itemizers only) taxable IRA distributions capital gains distributions on form 1040 other income or loss not otherwise enumerated here

Adjustments and items such as alimony paid Keogh and IRA contributions foreign income exclusion NOLs can be entered here as negative income.(+/-)

This is quite a broad range of income types. Certainly many of these types should be included in the base of the NIIT (like interest and non-qualified dividends), but others seem inappropriate for the NIIT (like alimony and fellowships). Maybe it was the case that all of these income types could be safely aggregated into a single amount, but with the recent introduction of the NIIT perhaps things are a bit more complicated.

This is a quandry. I hate to change the input format to add a variable. All of the items that shouldn't be in the NIIT are rare, and I am inclined to just include item 10 in NII.

When taxsim10 comes along, it will have a variable number of inputs and I can make adjustments an additional variable.

dan

@feenberg @GoFroggyRun @MattHJensen

— Reply to this email directly or view it on GitHub.[AHvQVQHx27-Wpho0k4YAIBZMNyriN-dGks5phoxIgaJpZM4HTssI.gif]

martinholmer commented 8 years ago

Dan said:

This is a quandary. I hate to change the input format to add a variable. All of the items that shouldn't be in the NIIT are rare, and I am inclined to just include item 10 in NIIT.

I agree. You could consider making these two changes to TAXSIM 9:

(a) Add to the end of input variable 10 documentation something like the following sentence:

For years 2013+ include only income subject to the Net Investment Income Tax (Form 8960).

(b) Change the logic of TAXSIM so that input variable 10 is included in the base of the NIIT beginning in 2013.

It seems as if this approach would maintain backward (pre-2013) compatibility while including the most common items in "other property income" in the base of the NIIT for years 2013+. And you would get all this without changing the 22-variable input format.

cc @feenberg @GoFroggyRun @MattHJensen

martinholmer commented 8 years ago

Issue #578 is receiving more attention now that the EITC validation differences have been eliminated as of 25-Feb-2016.

@GoFroggyRun @feenberg @MattHJensen

martinholmer commented 8 years ago

Issue #578 has been resolved because NBER has developed Internet TAXSIM version 10.0, which is an enhancement to the prior version 9.3. Version 10.0 includes input variable 10 (other property income) as part of the base of the Net Investment Income Tax, which was introduced into the federal individual income tax system beginning in 2013. Although the results have not yet been added to the Tax-Calculator repository, Internet TAXSIM and Tax-Calculator have passed cross-validation tests that use 100,000 randomly-generated filing units (about 95% of which have positive values for input variable 10) each in the d13.in, d14.in, and d15.in samples for 2013, 2014, and 2015, respectively. There are no differences (more than one cent) in FICA or income tax liabilities for these 300,000 filing units.

This development resolves the issue originally raised by Sean Wang in #578.

@GoFroggyRun @MattHJensen @feenberg @Amy-Xu

MattHJensen commented 8 years ago

Issue #578 has been resolved because NBER has developed Internet TAXSIM version 10.0, which is an enhancement to the prior version 9.3. Version 10.0 includes input variable 10 (other property income) as part of the base of the Net Investment Income Tax, which was introduced into the federal individual income tax system beginning in 2013.

nice! Thanks @martinholmer, @GoFroggyRun, @feenberg.