PSLmodels / Tax-Calculator

USA Federal Individual Income and Payroll Tax Microsimulation Model
https://taxcalc.pslmodels.org
Other
254 stars 154 forks source link

Very high marginal tax rates on long-term capital gains #503

Closed martinholmer closed 8 years ago

martinholmer commented 8 years ago

The testmtr.py program was recently added to the Tax-Calculator repository. This program simply computes 2013 marginal tax rates (mtr) with respect to each income type for records in the puf.csv file and then summarizes those mtr into two histograms (one for FICA and one for income taxes) for each income type. The top edge of the highest mtr histogram bin is 1.0 (that is, a 100 percent marginal tax rate). This issue identifies which puf.csv records have mtr values above 1.0 (or 100 percent).

Before pull request #484 was merged into the master branch, there was only one puf.csv record (out of 219,814) that had mtr above one: recid 126452, which had a mtr with respect to interest income of 12612.69 and also had a mtr with respect to long-term capital gains of 12612.69. Here is this record's 2013 Tax-Calculator output (in Internet-TAXSIM format) where the mtr is computed with respect to long-term capital gains (rather than taxpayer earnings):

> cp ../tax-calculator-data/puf.csv .
> python inctax.py --blowup --mtrinc e23250 puf.csv 2013
You loaded data for 2009.
Your data have been extrapolated to 2013.
> awk '$1==126452' puf-13.out-inctax
126452. 2013 0 433.87 0.00 0.00 1261268.73 0.00 0.00 18027.30 0.00 0.00 0.00 19500.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1000.00 0.00 126.13 18027.30 0.00 0.00

Notice that output variable 25 (EITC) is $126.13 before the one cent is added to long-term capital gains. The EITC has an unearned income eligibility limit, beyond which all EITC payments are lost. The extra one cent of long-term capital gains apparently pushed this record over that limit causing a complete loss in EITC payments. So, the mtr would be 126.13/0.01, which equals 12613, which is within rounding error of the 12612.6873 calculated by Tax-Calculator. This inference is supported by the fact that this record has the exact same mtr with respect to taxable interest income, e00300.

So, in summary, the very high marginal tax rates calculated for recid 126452 are accurate representations of how the income tax works under current law.

After pull request #484 was merged into the master branch, there are now nine additional records who have a marginal tax rate with respect to long-term capital gains (ltcg) above one or 100 percent, but none of them receive an EITC payment before the one cent is added to ltcg. There are no very high mtr with respect to other income types (except for the one for recid 126452 discussed above). Here are those nine records:

> awk '$7>100&&$25==0{printf("%.2f\t%d\n",$7/100,$1);n++}END{print n}' puf-13.out-inctax
2779750.00  322958
13285000.19 295708
12612.69    126452
34248.95    48609
77609.22    341136
2142221.71  296832
31175.26    338146
102975.26   205040
730419.94   234412
723720.15   363198
9

The first column above contains the decimal mtr and the second column contains the recid.

I looked briefly at some of the puf.csv input variables for these nine records and the only things that I noticed that the nine have in common are: (a) they all have Schedule E income, (b) none of them have tax-exempt interest income, (c) and they all have positive long-term capital gains.

I don't have enough familiarity with how the income tax code works to know whether or not it contains rules (other than the EITC unearned income eligibility limit mentioned above) that would generate these very high marginal tax rates on long-term capital gains. If there are not other rules that generate notches in income tax liability, then pull request #484 may have introduced one or more bugs into the Tax-Calculator.

The only thing that seems certain to me is that the income tax calculations for these nine records need to be checked. If the checking finds these mtr are legitimate, then the features of the tax code that generate these notches need to be documented. If the checking finds logic flaws, the bugs need to be eliminated.

cc @MattHJensen @feenberg @GoFroggyRun @Amy-Xu @jdebacker

feenberg commented 8 years ago

On Sun, 13 Dec 2015, Martin Holmer wrote:

The testmtr.py program was recently added to the Tax-Calculator repository. This program simply computes 2013 marginal tax rates (mtr) with respect to each income type for records in the puf.csv file and then summarizes those mtr into two histograms (one for FICA and one for income taxes) for each income type. The top edge of the highest mtr histogram bin is 1.0 (that is, a 100 percent marginal tax rate). This issue identifies which puf.csv records have mtr values above 1.0 (or 100 percent).

Before pull request #484 was merged into the master branch, there was only one puf.csv record (out of 219,814) that had mtr above one: recid 126452, which had a mtr with respect to interest income of 12612.69 and also had a mtr with respect to long-term capital gains of 12612.69. Here is this record's 2013 Tax-Calculator output (in Internet-TAXSIM format) where the mtr is computed with respect to long-term capital gains (rather than taxpayer earnings):

cp ../tax-calculator-data/puf.csv . python inctax.py --blowup --mtrinc e23250 puf.csv 2013 You loaded data for 2009. Your data have been extrapolated to 2013. awk '$1==126452' puf-13.out-inctax

  1. 2013 0 433.87 0.00 0.00 1261268.73 0.00 0.00 18027.30 0.00 0.00 0.00 19500.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1000.00 0.00 126.13 18027.30 0.00 0.00

Notice that output variable 25 (EITC) is $126.13 before the one-cent is added to long-term capital gains. The EITC has an unearned income eligibility limit, beyond which all EITC payments are lost. The extra one-cent of long-term capital gains apparently pushed this record over that limit causing a complete loss in EITC payments. So, the mtr would be 126.13/0.01, which equals 12613, which is within rounding error of the 12612.6873 calculated by Tax-Calculator. This inference is supported by the fact that this record has the exact same mtr with respect to taxable interest income, e00300.

So, in summary, the very high marginal tax rates calculated for recid 126452 are accurate representations of how the income tax works under current law.

After pull request #484 was merged into the master branch, there are now nine additional records who have a marginal tax rate with respect to long-term capital gains (ltcg) above one or 100 percent, but none of them receive an EITC payment before the one cent is added to ltcg. There are no very high mtr with respect to other income types (except for the one for recid 126452 discussed above). Here are those nine records:

awk '$7>100&&$25==0{printf("%.2f\t%d\n",$7/100,$1);n++}END{print n}' puf-13.out-inct ax 2779750.00 322958 13285000.19 295708 12612.69 126452 34248.95 48609 77609.22 341136 2142221.71 296832 31175.26 338146 102975.26 205040 730419.94 234412 723720.15 363198 9

The first column above contains the decimal mtr and the second column contains the recid.

I looked briefly at some of the puf.csv input variables for these nine records and the only things that I noticed that the nine have in common are: (a) they all have Schedule E income, (b) none of them have tax-exempt interest income, (c) and they all have positive long-term capital gains.

I don't have enough familiarity with how the income tax code works to know whether or not it contains rules (other than the EITC unearned income eligibility limit mentioned above) that would generate these very high marginal tax rates on long-term capital gains. If there are not other rules that generate notches in income tax liability, then pull request #484 may have introduced one or more bugs into the Tax-Calculator.

The income tax code for Schedule D is so complicated, no one involved with the project will be sufficiently familiar to see the problem by looking at the data. The way we deal with problems like this is to get a Schedule D and file it out line by line, comparing the results with the C-values in from the calculator. Usually it becomes clear where the error is. If you send me the 22 variables of input and the C values calculated I can do this today.

A tax rate of 13 million percent implies a change in tax of 130,000 - which should show up on the Schedule D quite clearly!

dan

The only thing that seems certain to me is that the income tax calculations for these nine records need to be checked. If the checking finds these mtr are legitimate, then the features of the tax code that generate these notches need to be documented. If the checking finds logic flaws, the bugs need to be eliminated.

cc @MattHJensen @feenberg @GoFroggyRun @Amy-Xu @jdebacker

— Reply to this email directly or view it on GitHub.[AHvQVTUy0yr9HqYyr5ZR9oqG59-1AjOBks5pPeOugaJpZM4G0ZtU.gif]

Amy-Xu commented 8 years ago

I agree with Dan - that seems to be the best way to go for Schedule D related variables. For the EITC part, a quick method of checking I imagine would be to erase the EITC of that taxpayer beforehand and then run the calculator mtr function again. If the outlier MTR result doesn't show up, then that would justify the extreme MTR of this taxpayer.

martinholmer commented 8 years ago

Dan @feenberg said:

The way we deal with problems like this is to get a Schedule D and file it out line by line, comparing the results with the C-values in from the calculator. Usually it becomes clear where the error is. If you send me the 22 variables of input and the C values calculated I can do this today.

Thanks for the generous offer Dan. The ten records with very high mtr are in the puf.csv file and each record has over 200 input variables (not just 22). I'm going to be out of the office today, but I'm sure that Sean or Amy can extract the input variables and calculated values for these records and send them to you.

cc @MattHJensen @GoFroggyRun @Amy-Xu

feenberg commented 8 years ago

On Mon, 14 Dec 2015, Amy Xu wrote:

I agree with Dan - that seems to be the best way to go for Schedule D related variables. For the EITC part, a quick method of checking I imagine would be to erase the EITC of that taxpayer beforehand and then run the calculator mtr function again. If the outlier MTR result doesn't show up, then that would justify the extreme MTR of this taxpayer.

I am not sure what this means. What happened to testing MTRs by subtracting a penny and comparing to adding a penny?

dan

— Reply to this email directly or view it on GitHub.[AHvQVR_4trLXq4tcbOH2GtV5DaaA2aZXks5pPsoQgaJpZM4G0ZtU.gif]

Amy-Xu commented 8 years ago

@feenberg My bad I misread Martin's post. There's no problem on that side anymore.

GoFroggyRun commented 8 years ago

@martinholmer @feenberg I'll take care of the issue, and will have the results posted when available.

martinholmer commented 8 years ago

Dan @ said:

What happened to testing MTRs by subtracting a penny and comparing to adding a penny?

In an attempt to confirm the EITC story I told about recid 126452, I changed the finite difference from plus one cent to minus one cent. The results seem to confirm the EITC story for one of the ten high mtr records, but the rest of the results are confusing to me. I'm posting them here in hopes the results will make more sense to others on the development team.

Here is what I did: (1) temporarily edit calculate.py so that finite_diff = 0.01 becomes finite_diff = -0.01. (2) cp ../tax-calculator-data/puf.csv . (3) python inctax.py --blowup --mtrinc e23250 puf.csv 2013 which produces this:

You loaded data for 2009.
Your data have been extrapolated to 2013.

(4) awk '$7>100{printf("%.2f\t%d\n",$7/100,$1);n++}END{print n}' puf-13.out-inctax which produces this:

No results from the awk script means there are no mtr values greater than 100 percent. (5) python testmtr.py > results_mtr.txt which produces results that differ from the plus-one-cent results as follows:

*** PLUS 0.01 ***
FICA and IIT mtr histogram bin counts for e00300:
219814 : 219814      0      0      0      0      0      0      0      0      0
219813 :      0      0      0      0  60034  63690  43192  26429  25764    704
WARNING: sum of bin counts is too low
         max(mtr)=12612.69
         mtr=12612.69 for recid=126452

becomes

*** MINUS 0.01 ***
FICA and IIT mtr histogram bin counts for e00300:
219814 : 219814      0      0      0      0      0      0      0      0      0
219814 :      0      0      0      0  55666  72336  40164  25071  25808    769

and

*** PLUS 0.01 ***
FICA and IIT mtr histogram bin counts for e23250:
219814 : 219814      0      0      0      0      0      0      0      0      0
219804 :      0      0      0  17251 125369  38109  33939   2146   2943     47
WARNING: sum of bin counts is too low
         max(mtr)=13285000.19
         mtr=2779750.00 for recid=322958
         mtr=13285000.19 for recid=295708
         mtr=12612.69 for recid=126452
         mtr=34248.95 for recid=48609
         mtr=77609.22 for recid=341136
         mtr=2142221.71 for recid=296832
         mtr=31175.26 for recid=338146
         mtr=102975.26 for recid=205040
         mtr=730419.95 for recid=234412
         mtr=723720.15 for recid=363198

becomes

*** MINUS 0.01 ***
FICA and IIT mtr histogram bin counts for e23250:
219814 : 219814      0      0      0      0      0      0      0      0      0
219814 :      0      0      0    424  86702  65822  47336  11794   7524    212

Summarizing, it appears (unless I did something wrong) that when we subtract one cent (instead of add the one cent), all ten records with very high mtr no longer have very high mtr values. Here are their results when subtracting one cent:

awk '$1==126452{print $1,$4,$7,$25}' puf-13.out-inctax 
126452. 433.87 7.65 126.13

So the one of the ten who has an EITC benefit does not lose it when unearned income is reduced by one cent (instead of being increased by one cent). The other nine have the following results (where the extracted output is record id, income tax liability, and marginal income tax rate on ltcg):

awk '$1==RECID{print $1,$4,$7}' puf-13.out-inctax 
322958. 1198005.05 -0.00
295708. 139197.91 3.80
48609. 4783.80 20.00
341136. 101405.02 31.11
296832. 64554.94 34.01
338146. 108166.29 10.80
205040. 119987.55 10.80
234412. 144931.12 28.80
363198. 265831.68 29.99

cc @MattHJensen @GoFroggyRun @Amy-Xu

martinholmer commented 8 years ago

Dan @feenberg, Because of my typo in addressing you, I think it is likely that you didn't get (via email) my posting on results generated when finite_diff = -0.01 instead of finite_diff = +0.01. You can print out my Monday PM comment from the conversation on issue #503. Sorry about that. Your thoughts on those results would be greatly appreciated.

feenberg commented 8 years ago

I did get it. My only thought is to repeat that if I had the 22 variables I would fill out a Schedule D and that would reveal the source of the problem on the penny increment. Note that the EITC marginal rate is "correct" but I don't believe the oddball rates on capital gains are.

dan

On Mon, 14 Dec 2015, Martin Holmer wrote:

Dan @feenberg, Because of my typo in addressing you, I think it is likely that you didn't get (via email) my posting on results generated when finite_diff = -0.01 instead of finite_diff = +0.01. You can print out my Monday PM comment from the conversation on issue #503. Sorry about that. Your thoughts on those results would be greatly appreciated.

— Reply to this email directly or view it on GitHub.[AHvQVZQWDqE4EPdCVeZFo4zUnwf4SpFTks5pPz93gaJpZM4G0ZtU.gif]

martinholmer commented 8 years ago

Dan @feenberg said:

My only thought is to repeat that if I had the 22 variables I would fill out a Schedule D and that would reveal the source of the problem on the penny increment. Note that the EITC marginal rate is "correct" but I don't believe the oddball rates on capital gains are.

I think your assessment is very likely to turn out to be true, but there is a problem in checking the other nine records. They are not Internet-TAXSIM input records (with just 22 input variables), but rather Tax-Calculator puf.csv records. And these PUF records have approximately 220 input variables (many of which are zero). My understanding is that Sean (@GoFroggyRun) Wang is in the process of extracting the non-zero variables (with their 2013 blown-up values) for those nine PUF records and will post them as part of this issue #503 conversation.

cc @MattHJensen @Amy-Xu

MattHJensen commented 8 years ago

My understanding is that Sean (@GoFroggyRun) Wang is in the process of extracting the non-zero variables (with their 2013 blown-up values) for those nine PUF records and will post them as part of this issue #503 conversation.

@GoFroggyRun, could you send @feenberg the data via email rather than in this issue? I believe @GoFroggyRun will run through the forms as well so we'll have two pairs of eyes on this.

We should not post the raw data to GitHub for disclosure avoidance reasons, even if it has been blown up to 2013.

As an aside, there might be some issue working off of RECID since RECID is not currently unique for CPS units. See https://github.com/open-source-economics/taxdata/issues/10. @Amy-Xu, do you have any suggestions about this?

martinholmer commented 8 years ago

@MattHJensen said:

As an aside, there might be some issue working off of RECID since RECID is not currently unique for CPS units. See open-source-economics/taxdata#10. @Amy-Xu, do you have any suggestions about this?

How can a development team work with a dataset that does not have a unique record key? This problem needs to be fixed as soon as possible. John O'Hare promised @Amy-Xu he would fix (in the next few days) inconsistent personal exemption variable values in the CPS records, so have him fix this recid problem as part of the same update to a new puf.csv file. I don't see any other option other than using the recid variable as the dataset key. That is what is used as the key in the IRS SOI PUF, so I don't understand why that wasn't done originally with the merged CPS records.

cc @GoFroggyRun

Amy-Xu commented 8 years ago

@martinholmer It was my bad. John has one variable uniquely identifying all the records in cps-puf, which is just row number minus one. I didn't import that variable when creating our csv file. Will replace current file with a new one by COB today.

feenberg commented 8 years ago

Can wee keep the PUF RECID also?

dan

On Tue, 15 Dec 2015, Amy Xu wrote:

@martinholmer It was my bad. John has one variable uniquely identifying all the records in cps-puf, which is just row number minus one. I didn't import that variable when creating our csv file. Will replace current file with a new one by COB today.

— Reply to this email directly or view it on GitHub.[AHvQVSx4T81BHAJgc2uTMZgF8J8g8izRks5pQDzqgaJpZM4G0ZtU.gif]

Amy-Xu commented 8 years ago

@feenberg sure!

martinholmer commented 8 years ago

@Amy-Xu said:

John has one variable uniquely identifying all the records in cps-puf, which is just row number minus one. I didn't import that variable when creating our csv file. Will replace current file with a new one by COB today.

Dan @feenberg said:

Can we keep the PUF RECID also?

I think adding yet another variable to puf.csv is a terrible idea.

The CPS records are being merged into the SOI PUF which has a unique key called RECID. The added CPS records should have a RECID variable whose value is unique among all the puf.csv records. We don't need or want another variable. Please ask John O'Hare to fix this oversight in a sensible way.

cc @MattHJensen @GoFroggyRun

feenberg commented 8 years ago

On Tue, 15 Dec 2015, Martin Holmer wrote:

@Amy-Xu said:

  John has one variable uniquely identifying all the records in cps-puf, which is just row number minus one. I didn't
  import that variable when creating our csv file. Will replace current file with a new one by COB today.

Dan @feenberg said:

  Can we keep the PUF RECID also?

I think adding yet another variable to puf.csv is a terrible idea.

The CPS records are being merged into the SOI PUF which has a unique key called RECID. The added CPS records should have a RECID variable whose value is unique among all the puf.csv records. We don't need or want another variable. Please ask John O'Hare to fix this oversight in a sensible way.

Can we make up suitable RECIDs for records that come from the CPS have no RECID assigned by the IRS? Perhaps just start numbering them above the largest RECID from the IRS sample.

dan

cc @MattHJensen @GoFroggyRun

— Reply to this email directly or view it on GitHub.[AHvQVXLPsSKq5hl5kYtyf-RToW6sheAgks5pQFcOgaJpZM4G0ZtU.gif]

martinholmer commented 8 years ago

Dan @feenberg said:

Can we make up suitable RECIDs for records that come from the CPS? Perhaps just start numbering them above the largest RECID from the IRS sample.

This would be a sensible solution that would not involve creating another unneeded variable in the puf.csv file.

cc @MattHJensen @Amy-Xu @GoFroggyRun

GoFroggyRun commented 8 years ago

Dealt with in PR #506.

Amy-Xu commented 8 years ago

@martinholmer @feenberg Sorry for delayed response on this. I'll email John and see how to fix this issue. Maybe we need to set up a phone call some time since I'm not quite sure how easy it is to build every thing on RECID. The reason is that some PUF records got split during the statistical matching. So even before all non-filers were added in, RECID was not unique for each record anymore. Let me confirm this John, and try to solve this issue as soon as possible.

martinholmer commented 8 years ago

@Amy-Xu said:

... some PUF records got split during the statistical matching. So even before all non-filers [from CPS] were added in, RECID was not unique for each record anymore.

Consider this three-step process:

(1) For split SOI PUF records, set the RECID to the SOI RECID with an extra digit at the end. So, for example, if RECID 456237 was split in three, then the three split records in puf.csv would have RECID values of 4562371, 4562372, and 4562373.

(2) For the 5693 CPS records added to the puf.csv file, simply set the RECID equal to something like 4000001 through 4005693.

(3) Check that all 219814 RECID values are unique. If not, revise the above steps to get uniqueness.

It seems to me that you (rather than John) should do this in a Python program that you develop in the taxdata repository. The role of that program would be to transform the SAS-generated puf.csv file into a more sensibly structured puf.csv file that would be used by the team in development work and also be used by TaxBrain. The respecification of RECID values is just the first of several changes to the SAS-generated puf.csv file that are going to be needed to resolve issue #425. It doesn't make any sense to ask John to do the things that you can do. (The one exception to this is that John needs to make the personal exemption X* variables consistent.) I will be happy to provide advice and review of the Python program (in the taxdata repo) you develop for these tasks. What do you think?

cc @MattHJensen @feenberg @GoFroggyRun

Amy-Xu commented 8 years ago

@martinholmer I do have a python file on hand, converting all the variable names he used in SAS to our format. Probably I can just add this step in my python script. Your suggestion seems very realistic and feasible to me. I'll give it a try as soon as I can.

feenberg commented 8 years ago

On Wed, 16 Dec 2015, Amy Xu wrote:

@martinholmer @feenberg Sorry for delayed response on this. I'll email John and see how to fix this issue. Maybe we need to set up a phone call some time since I'm not quite sure how easy it is to build every thing on RECID. The reason is that some PUF records got split during the statistical matching. So even before all non-filers were added in, RECID was not unique for each record anymore. Let me confirm this John, and try to solve this issue as soon as possible.

Perhaps .5 could be added to RECID of the duplicate? Double precision gives us 15 digits of precision.

dan

— Reply to this email directly or view it on GitHub.[AHvQVb6BeMszeZDTTjyRHcUbG6mVHlK0ks5pQcwNgaJpZM4G0ZtU.gif]

martinholmer commented 8 years ago

Dan @feenberg said:

Perhaps .5 could be added to RECID of the duplicate? Double precision gives us 15 digits of precision.

Far better to have integer RECID values because float equality comparisons are not always reliable. Some decimal numbers cannot be exactly represented as a binary number even with double precision.

@MattHJensen @Amy-Xu @GoFroggyRun

martinholmer commented 8 years ago

@Amy-Xu said:

I do have a python file on hand, converting all the variable names he used in SAS to our format. Probably I can just add this step in my python script. Your suggestion seems very realistic and feasible to me. I'll give it a try as soon as I can.

Great! Just move that python program into the taxdata repo giving it a meaningful name. Then add the unique-integer-RECID logic to that program. Let me know if you have any questions or need some advice.

cc @MattHJensen @feenberg @GoFroggyRun

feenberg commented 8 years ago

On Wed, 16 Dec 2015, Martin Holmer wrote:

Dan @feenberg said:

  Perhaps .5 could be added to RECID of the duplicate? Double precision
  gives us 15 digits of precision.

Far better to have integer RECID values because float equality comparisons are not always reliable. Some decimal numbers cannot be exactly represented as a binary number even with double precision.

Yes, but .5, .25, .75 do have exact binary representations.

If fractions are disallowed one must consider that adding a digit to a 5 digit RECID may conflict with an already existing 6 digit RECID. Is the largest RECID 6 digits? Then sufficient digits must be added to make 7 digits.

dan

@MattHJensen @Amy-Xu @GoFroggyRun

— Reply to this email directly or view it on GitHub.[AHvQVSnkQVR8BgY9WQcDzgOOsNQ1G_NRks5pQeX9gaJpZM4G0ZtU.gif]

martinholmer commented 8 years ago

Dan @feenberg said:

Yes, but .5, .25, .75 do have exact binary representations.

Perhaps, but I already have enough to worry about in resolving issue #425, so I don't want to have to worry about the following kind of problem:

(16) Why does ROUND(9.95,1) return 9.9 instead of 10.0? Shouldn't 9.95 round up?

SQLite uses binary arithmetic and in binary, there is no way to write 9.95 in a finite number of bits. The closest to you can get to 9.95 in a 64-bit IEEE float (which is what SQLite uses) is 9.949999999999999289457264239899814128875732421875. So when you type "9.95", SQLite really understands the number to be the much longer value shown above. And that value rounds down.

This kind of problem comes up all the time when dealing with floating point binary numbers. The general rule to remember is that most fractional numbers that have a finite representation in decimal (a.k.a "base-10") do not have a finite representation in binary (a.k.a "base-2"). And so they are approximated using the closest binary number available. That approximation is usually very close, but it will be slightly off and in some cases can cause your results to be a little different from what you might expect.

martinholmer commented 8 years ago

Dan @feenberg said:

If fractions are disallowed one must consider that adding a digit to a 5 digit RECID may conflict with an already existing 6 digit RECID. Is the largest RECID 6 digits? Then sufficient digits must be added to make 7 digits.

You are correct. I expect that the three-step procedure of specifying unique integer RECID values in the puf.csv that I suggested (above) to @Amy-Xu will catch this sort of problem. Thanks for pointing this out. You will be able to follow Amy's work on this issue as she develops a pull request in the taxdata repository.

@MattHJensen @GoFroggyRun

martinholmer commented 8 years ago

The issue of the nine very high marginal tax rates (mtr) for those without an EITC payment has been resolved by the bug fix in pull request #506. After pull request #505 is merged into the master branch, the new mtr histograms generated by testmtr.py will be posted here for the record and this issue will be closed.

cc @MattHJensen @feenberg @GoFroggyRun @Amy-Xu

martinholmer commented 8 years ago

After adding more income types to the Calculator mtr() method in pull request #505 and after fixing a bug in pull request #506, the marginal tax rate histograms generated by the testmtr.py program are as follows:

MTR computed using POSITIVE finite_diff.
You loaded data for 2009.
Your data have been extrapolated to 2013.
Total number of data records = 219814
FICA mtr histogram bin edges:
     [0.0, 0.02, 0.04, 0.06, 0.08, 0.1, 0.12, 0.14, 0.16, 0.18, 1.0]
IIT mtr histogram bin edges:
     [-1.0, -0.3, -0.2, -0.1, 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 1.0]
FICA and IIT mtr histogram bin counts for e00200p:
219814 :      0  31950      0      0      0      0      0 183680   4184      0
219814 :   4585    105   1760  14232  39128  62570  47348  27138  22252    696
FICA and IIT mtr histogram bin counts for e00900p:
219814 :  14116  29235      0      0      0   1343      0 175120      0      0
219814 :   4583    107   1760  14145  43985  60019  52748  24103  18028    336
FICA and IIT mtr histogram bin counts for e00300:
219814 : 219814      0      0      0      0      0      0      0      0      0
219813 :      0      0      0      0  60034  63689  43191  26432  25763    704
WARNING: sum of bin counts is too low
         max(mtr)=12612.69
         mtr=12612.69 for recid=126452
FICA and IIT mtr histogram bin counts for e00400:
219814 : 219814      0      0      0      0      0      0      0      0      0
219813 :      0      0      0      0 206416   7622   5639     97     33      6
WARNING: sum of bin counts is too low
         max(mtr)=12612.69
         mtr=12612.69 for recid=126452
FICA and IIT mtr histogram bin counts for e00600:
219814 : 219814      0      0      0      0      0      0      0      0      0
219813 :      0      0      0      0  60044  63679  43230  26393  25763    704
WARNING: sum of bin counts is too low
         max(mtr)=12612.69
         mtr=12612.69 for recid=126452
FICA and IIT mtr histogram bin counts for e00650:
219814 : 219814      0      0      0      0      0      0      0      0      0
219813 :      0      0      0  17790 106998  43515  50455    625    408     22
WARNING: sum of bin counts is too low
         max(mtr)=12612.69
         mtr=12612.69 for recid=126452
FICA and IIT mtr histogram bin counts for e01400:
219814 : 219814      0      0      0      0      0      0      0      0      0
219814 :      0      0      0      0  60074  63670  46904  26780  21695    691
FICA and IIT mtr histogram bin counts for e01700:
219814 : 219814      0      0      0      0      0      0      0      0      0
219814 :      0      0      0      0  60084  63660  46912  26770  21697    691
FICA and IIT mtr histogram bin counts for e02000:
219814 : 219814      0      0      0      0      0      0      0      0      0
219814 :      0      0      0      0  60120  63625  43739  26157  25473    700
FICA and IIT mtr histogram bin counts for e02400:
219814 : 219814      0      0      0      0      0      0      0      0      0
219814 :      0      0      0      0  99289  39973  50784  28905    756    107
FICA and IIT mtr histogram bin counts for e22250:
219814 : 219814      0      0      0      0      0      0      0      0      0
219813 :      0      0      0     22  82267  61538  41843  19109  14608    426
WARNING: sum of bin counts is too low
         max(mtr)=12612.69
         mtr=12612.69 for recid=126452
FICA and IIT mtr histogram bin counts for e23250:
219814 : 219814      0      0      0      0      0      0      0      0      0
219813 :      0      0      0  17251 125371  38110  33944   2147   2943     47
WARNING: sum of bin counts is too low
         max(mtr)=12612.69
         mtr=12612.69 for recid=126452

Notice that RECID 126452 becomes EITC ineligible whenever capital income rises and looses all of the 126.13 EITC payment. As shown in an earlier comment in this conversation, when a negative finite_diff is used, RECID 126452 has marginal tax rates that are much less than 100 percent.

This resolves the Very-High-Marginal-Tax-Rates issue and so it is being closed.

cc @MattHJensen @feenberg @Amy-Xu @GoFroggyRun