How to determine source of TaxBrain vs taxcalc differences?

PSLmodels / Tax-Calculator

USA Federal Individual Income and Payroll Tax Microsimulation Model

https://taxcalc.pslmodels.org

Other

262 stars 157 forks source link

How to determine source of TaxBrain vs taxcalc differences? #655

Closed martinholmer closed 8 years ago

martinholmer commented 8 years ago

As reported in pull request #654, the TaxBrain webapp and the taxcalc package running on the local computer generally produce the same results (within rounding error) for a reform. But when the reform consists of both increases in the level of parameters and turning off the CPI-indexing of those parameters, the results from TaxBrain and taxcalc are somewhat different. The differences are explained in detail in pull request #654.

The goal of this issue is to generate a discussion that produces ideas about how to determine the source of these differences. Can anybody suggest an approach for resolving this puzzle?

@MattHJensen @talumbau @feenberg @Amy-Xu @GoFroggyRun

talumbau commented 8 years ago

This seems like a good issue to bring up. First, I want to make sure I understand what we are discussing. We know that we won't get exactly the same answer between running taxcalc on a local machine with a particular release and running TaxBrain with that same release of taxcalc because of the disclosure avoidance algorithm. The source of that difference comes from the fact that we drop 3 taxpayer records at "random" from each reporting group (either income deciles or income bin). So, depending on the weight associated with those "random" (actually pseudorandom and deterministic) choices, the difference between the pure taxcalc computation and its TaxBrain equivalent could be a large number in terms of absolute dollars (although I would expect that they would typically be small with respect to the overall reform).

So, given that we know these differences occur, how can we be sure that a difference in a given calculation is due to the disclosure avoidance algorithm or a bug in TaxBrain? I wonder if a good place to start here would be to provide some instructions on running the dropq package locally. That way, one could produce the exact output of TaxBrain on one's local machine. That seems like a good place to start. What do you think?

martinholmer commented 8 years ago

T.J. said:

This seems like a good issue to bring up. First, I want to make sure I understand what we are discussing. We know that we won't get exactly the same answer between running taxcalc on a local machine with a particular release and running TaxBrain with that same release of taxcalc because of the disclosure avoidance algorithm. The source of that difference comes from the fact that we drop 3 taxpayer records at "random" from each reporting group (either income deciles or income bin). So, depending on the weight associated with those "random" (actually pseudorandom and deterministic) choices, the difference between the pure taxcalc computation and its TaxBrain equivalent could be a large number in terms of absolute dollars (although I would expect that they would typically be small with respect to the overall reform).

Everything you say above is correct, but I doubt this has anything to do with the differences under discussion here. Remember a complex reform that changes just the levels of many parameters produces essentially the same results from TaxBrain and taxcalc, and a second complex reform that changes just the indexing status of many parameters also produces essentially the same results. There are differences between TaxBrain and taxcalc only when these two types of reform provisions (changing the level and changing the indexing status) are combined in a single reform. And the exact same differences are generated in different TaxBrain runs, which suggests (if I understand the dropQ algorithm correctly) that the differences have nothing to do with random subsampling in TaxBrain.

Then T.J. went on to say:

So, given that we know these differences occur, how can we be sure that a difference in a given calculation is due to the disclosure avoidance algorithm or a bug in TaxBrain? I wonder if a good place to start here would be to provide some instructions on running the dropq package locally. That way, one could produce the exact output of TaxBrain on one's local machine. That seems like a good place to start. What do you think?

Actually, I think the best place to start is for you to answer the questions I posed in pull request #654. I'll repeat them here for convenience:

The biggest problem in my investigation is that I (and other users) have no idea what really happens in TaxBrain when a CPI button is clicked off. When does that switch in indexing status occur? What happens when you raise the level of a parameter and turn-off indexing in the same year? So far, I have not come up with the answers to any of these questions about how TaxBrain handles a CPI-indexing reform that also involves a change in the level of that parameter.

talumbau commented 8 years ago

Ok, sounds good. Let me know if this answers your question:

if you click a CPI button on TaxBrain, a corresponding _cpi flag is set in the reform dictionary for the year corresponding to the start year on the TaxBrain input page. So for example, if you change the Personal and dependent exemption amount from the default 2016 value of 4050 to 4450 and click the 'CPI' button next to it (so that it is "off"), you get the following reform dictionary:

{2016: {'_II_em_cpi': False, '_II_em': [4450.0]}}

This is passed to a Policy object via the implement_reform method through the mechanism laid out in the run_nth_year function in the dropq package:

https://github.com/OpenSourcePolicyCenter/dropQ/blob/master/dropq/dropq.py#L361

martinholmer commented 8 years ago

T.J. said:

Ok, sounds good. Let me know if this answers your question:

If you click a CPI button on TaxBrain, a corresponding _cpi flag is set in the reform dictionary for the year corresponding to the start year on the TaxBrain input page. So for example, if you change the Personal and dependent exemption amount from the default 2016 value of 4050 to 4450 and click the 'CPI' button next to it (so that it is "off"), you get the following reform dictionary:

{2016: {'_II_em_cpi': False, '_II_em': [4450.0]}}

This is passed to a Policy object via the implement_reform method through the mechanism laid out in the run_nth_year function in the dropq package:

https://github.com/OpenSourcePolicyCenter/dropQ/blob/master/dropq/dropq.py#L361

Thanks for the explanation. There is quite a bit going on in the run_nth_year function before the implement_reform method is called, but things seem sensible as far I as could see after a quick glance.

I noticed above in your answer that the order of the _cpi reform provision and the change-in-the-level reform provisions is different from what I'm using in the taxcalc/taxbrain tests. I just added a test to test_policy.py that confirms the order makes no difference to the post-reform parameter values (see closed pull request #658).

So, I'm running out of ideas about what is causing the differences being discussed in this issue #655.

Is it possible that the difference, which grows to $1.1 billion in 2025 is an accumulation of rounding errors? There are 21 different parameters who are having their level increased by ten percent and having their CPI-indexing turned off in 2016, so a difference of less than $0.1 billion for each of the 21 parameters could add up to $1.1 billion. Does this seem plausible? Are there other explanations for the TaxBrain-vs-taxcalc differences being discussed in issue #655?

@MattHJensen @talumbau @feenberg @Amy-Xu @GoFroggyRun

talumbau commented 8 years ago

The best strategy that I could think of to determine the source of the difference would be to run the dropq package locally with the given reform dictionary and see if the answer is different when running taxcalc by itself. At the highest level, you could call the run_models function as is done in example.py:

https://github.com/OpenSourcePolicyCenter/dropQ/blob/master/example.py

this returns aggregated data already, so maybe it would make sense to output intermediate results. Or, you could just manually run the 10th year of the reform, via a single call to run_nth_year and just output the results of that calculation. Let me know if I can provide assistance here.

martinholmer commented 8 years ago

T.J. said:

The best strategy that I could think of to determine the source of the difference would be to run the dropq package locally with the given reform dictionary and see if the answer is different when running taxcalc by itself. At the highest level, you could call the run_models function as is done in example.py:

https://github.com/OpenSourcePolicyCenter/dropQ/blob/master/example.py

this returns aggregated data already, so maybe it would make sense to output intermediate results. Or, you could just manually run the 10th year of the reform, via a single call to run_nth_year and just output the results of that calculation. Let me know if I can provide assistance here.

Thanks for the pointers and offer of assistance. I'll try this tomorrow.

feenberg commented 8 years ago

On Mon, 21 Mar 2016, Martin Holmer wrote:

As reported in pull request #654, the TaxBrain webapp and the taxcalc package running on the local computer generally produce the same results (within rounding error) for a reform. But when the reform consists of both increases in the level of parameters and turning off the CPI-indexing of those parameters, the results from TaxBrain and taxcalc are somewhat different. The differences are explained in detail in pull request #654.

The goal of this issue is to generate a discussion that produces ideas about how to determine the source of these differences. Can anybody suggest an approach for resolving this puzzle?

I would reduce the dataset to a single record, and look at all the intermediate results.

dan

@MattHJensen @talumbau @feenberg @Amy-Xu @GoFroggyRun

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub[AHvQVWOsqiGvdfEGUdQ9QSrui-s6G-xiks5pvsSjgaJpZM4H1Tzs.gif]

feenberg commented 8 years ago

On Mon, 21 Mar 2016, T.J. Alumbaugh wrote:

This seems like a good issue to bring up. First, I want to make sure I understand what we are discussing. We know that we won't get exactly the same answer between running taxcalc on a local machine with a particular release and running TaxBrain with that same release of taxcalc because of the disclosure avoidance algorithm. The source of that difference comes from the fact that we drop 3 taxpayer records at "random" from each reporting group (either income deciles or income bin). So, depending on the weight associated with those "random" (actually pseudorandom and deterministic) choices, the difference between the pure taxcalc computation and its TaxBrain equivalent could be a large number in terms of

This might well be the explanation, but when designing the system we assumed that the seed for the random number generator would be composed of a hash of the parameters. So if the parameters are the same in both cases, the seed should be the same, and the random numbers the same also, unless the generator is system-specific. If that is the case, we should supply our own portable generator.

However, I take that the dropq doesn't apply to a local dataset. That is probably a reasonable default.

absolute dollars (although I would expect that they would typically be small with respect to the overall reform).

So, given that we know these differences occur, how can we be sure that a difference in a given calculation is due to the disclosure avoidance algorithm or a bug in TaxBrain? I wonder if a good place to start here would be to provide some instructions on running the dropq package locally. That way, one could produce the exact output of TaxBrain on one's local machine. That seems like a good place to start. What do you think?

Yes, an option to apply dropq would be necessary.

If this is the source of the difference, wouldn't the results always be different? Perhaps they are?

dan

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub[AHvQVcaqcg5ZLuRFWzSjedlFhXWFXZRUks5pvsj4gaJpZM4H1Tzs.gif]

feenberg commented 8 years ago

On Mon, 21 Mar 2016, T.J. Alumbaugh wrote:

The best strategy that I could think of to determine the source of the difference would be to run the dropq package locally with the given reform dictionary and see if the answer is different when running taxcalc by itself. At the highest level, you could call the run_models function as is done in example.py:

https://github.com/OpenSourcePolicyCenter/dropQ/blob/master/example.py

Is there a way for us insiders to run taxbrain without dropq? That wouldn't bother SOI, provided it was protected by a login/password.

dan

this returns aggregated data already, so maybe it would make sense to output intermediate results. Or, you could just manually run the 10th year of the reform, via a single call to run_nth_year and just output the results of that calculation. Let me know if I can provide assistance here.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub[AHvQVVT7pLAnjky_IwNAO7kIwZKulzOQks5pvvrugaJpZM4H1Tzs.gif]

talumbau commented 8 years ago

I'm not sure if this adds clarity to the conversation, but let me add this to make it clear why I'm advocating for running dropq locally. A reasonable approximation of what TaxBrain is would be as follows:

an input page
some code to translate user input from the input page into a reform dictionary
the calling of dropq.run_models with the reform dictionary from step 2

It's usually been pretty easy to track down problems with input issues that arise in step 2. So, if the output of TaxBrain is questionable, it's probably best to manually do step 3, which is basically what is shown in example.py at the top of the dropQ repo. To answer Dan's question, which I interpret as "is there a way to run TaxBrain w/o dropping records?" the answer is 'no', there is no flag or anything, but we could comment out those portions of the code to ensure there is no "noise" introduced in to the calculation. At that point, we could do a record-by-record investigation of any particular calculation, comparing the dropq calculation with the "raw" taxcalc calculation that Martin is doing.

martinholmer commented 8 years ago

T.J., I can't use the current version of the dropQ package with the version of puf.csv I've been using without problems in the Tax-Calculator repo for more than a week. The error I get is below. Can you fix this, so I can do the testing you suggested in issue #655? Thanks.

First the script that generates the error:

$ cat dropQ-test.py
import os
import sys
import json
import pandas as pd
CUR_PATH = os.path.abspath(os.path.dirname(__file__))
sys.path.append(os.path.join(CUR_PATH, '..', 'tax-calculator'))
import taxcalc
import dropq

# specify reform
myvars = {}
myvars['_II_em_cpi'] = False
first_year = 2016
user_mods = {first_year: myvars}

# run models using puf.csv data
tax_dta = pd.read_csv('../tax-calculator-data/puf.csv')
mY_dec, mX_dec, df_dec, pdf_dec, cdf_dec, mY_bin, mX_bin, df_bin, pdf_bin, cdf_bin, fiscal_tots = dropq.run_models(tax_dta, num_years=3, start_year=first_year, user_mods=user_mods)

exit(0)

And second, the error message itself:

$ python dropQ-test.py
Traceback (most recent call last):
  File "dropQ-test.py", line 18, in <module>
    mY_dec, mX_dec, df_dec, pdf_dec, cdf_dec, mY_bin, mX_bin, df_bin, pdf_bin, cdf_bin, fiscal_tots = dropq.run_models(tax_dta, num_years=3, start_year=first_year, user_mods=user_mods)
  File "/Users/mrh/work/OSPC/dropQ-master/dropq/dropq.py", line 531, in run_models
    cast_to_double(tax_dta)
  File "/Users/mrh/work/OSPC/dropQ-master/dropq/utils.py", line 36, in cast_to_double
    df[cols] = df[cols] * 1.0
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1791, in __getitem__
    return self._getitem_array(key)
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1835, in _getitem_array
    indexer = self.ix._convert_to_indexer(key, axis=1)
  File "/Users/mrh/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1112, in _convert_to_indexer
    raise KeyError('%s not in index' % objarr[mask])
KeyError: "['e01500' 'e04250' 'e04600' 'e04800' 'e05200' 'e06000' 'e06200' 'e06300'\n 'e06500' 'e07150' 'e07180' 'e07220' 'e07230' 'e09600' 'e10300' 'e10605'\n 'e10700' 'e10900' 'e10950' 'e10960' 'e11100' 'e11300' 'e11400' 'e11570'\n 'e11581' 'e11582' 'e11583' 'e11900' 'e12000' 'e12200' 'e15100' 'e15210'\n 'e15250' 'e18600' 'e22320' 'e22370' 'e24535' 'e24560' 'e24570' 'e24598'\n 'e24615' 'e25820' 'e25850' 'e25860' 'e25920' 'e25940' 'e25960' 'e25980'\n 'e26100' 'e26110' 'e26160' 'e26170' 'e26180' 'e26190' 'e26390' 'e26400'\n 'e30400' 'e30500' 'e33000' 'e53240' 'e53280' 'e53300' 'e53317' 'e53410'\n 'e53458' 'e58950' 'e59680' 'e59700' 'e59720' 'e68000' 'e82200' 'e87550'\n 'e87870' 'e87875' 'e87880' 'p25350' 'p25380' 'p25700' 'p27895' 'p61850'\n 'p65300' 'p65400'] not in index"
$

@talumbau @MattHJensen

talumbau commented 8 years ago

I thought you wanted to compare to current TaxBrain - is that correct? The current TaxBrain runs off taxcalc version 0.6.2, dropq version 0.6.8 (which is master of the dropq repo) and a PUF from January 7 with the following characteristics:

total byte size of puf.csv.gz: 17435790 SHA1 sum of puf.csv.gz: ebb17cde9eb92651d72d143602c177a217e86c7e

In terms of upgrading the PUF on TaxBrain, it looked like the right answer was the wait for the conclusion of the discussion and activities in this PR:

https://github.com/open-source-economics/Tax-Calculator/pull/638

When we get a new version from Amy, I will tag and release a new taxcalc and upgrade the nodes to use the new PUF file. Right now, we are using the one from earlier in the year.

martinholmer commented 8 years ago

T.J. said:

I thought you wanted to compare to current TaxBrain - is that correct? The current TaxBrain runs off taxcalc version 0.6.2, dropq version 0.6.8 (which is master of the dropq repo) and a PUF from January 7 with the following characteristics:

total byte size of puf.csv.gz: 17435790 SHA1 sum of puf.csv.gz: ebb17cde9eb92651d72d143602c177a217e86c7e

In terms of upgrading the PUF on TaxBrain, it looked like the right answer was the wait for the conclusion of the discussion and activities in this PR:

638

When we get a new version from Amy, I will tag and release a new taxcalc and upgrade the nodes to use the new PUF file. Right now, we are using the one from earlier in the year.

Amy, Can you give me access to the puf.csv file being used by TaxBrain? I need the version T.J. describes above. Thanks.

@talumbau @Amy-Xu

talumbau commented 8 years ago

@martinholmer just to make sure it's clear, this was the PUF that everyone was using a couple of months ago. It was not a special PUF given to me by Amy so if you have your old version around you should be good to go.

Amy-Xu commented 8 years ago

@talumbau I just pointed Martin to the right one. No worries.

martinholmer commented 8 years ago

After acquiring the old puf.csv file still being used by TaxBrain, I was able to conduct the test suggested in issue #655 by T.J. Below I show the script I wrote, which is based heavily on the dropQ/example.py script, and I show the results of that script and a comparison with the results generated by TaxBrain. These results show that dropQ generates the same results as the taxcalc package, indicating that the TaxBrain-vs-taxcalc differences being discussed in issue #655 are caused by some problem in TaxBrain.

Here is the dropQ-test.py script:

import os
import sys
import json
import pandas as pd
CUR_PATH = os.path.abspath(os.path.dirname(__file__))
sys.path.append(os.path.join(CUR_PATH, '..', 'tax-calculator'))
import taxcalc
import dropq

# specify number of years to simulate starting with first_year
first_year = 2016
nyrs = 10

# specify reform
reform = {first_year: {
'_ACTC_ChildNum': [ 3.],
'_ACTC_rt': [ 0.165],
'_ALD_Alimony_HC': [ 0.],
'_ALD_EarlyWithdraw_HC': [ 0.],
'_ALD_KEOGH_SEP_HC': [ 0.],
'_ALD_SelfEmp_HealthIns_HC': [ 0.],
'_ALD_SelfEmploymentTax_HC': [ 0.],
'_ALD_StudentLoan_HC': [ 0.],
'_AMED_thd': [[ 220000., 275000., 137500., 220000., 220000., 137500.]],
'_AMED_trt': [ 0.0099],
'_AMT_CG_rt1': [ 0.],
'_AMT_CG_rt2': [ 0.165],
'_AMT_CG_rt3': [ 0.22],
'_AMT_CG_thd1': [[ 41415., 82830., 41415., 55440., 82830., 41415.]],
'_AMT_CG_thd1_cpi': False,
'_AMT_CG_thd2': [[ 456555. , 513645. , 256822.5, 485100. , 513645. , 256822.5]],
'_AMT_CG_thd2_cpi': False,
'_AMT_em': [[ 59290., 92180., 46090., 59290., 92180., 46090.]],
'_AMT_em_cpi': False,
'_AMT_em_ps': [[ 131670., 175670.,  87835., 131670., 175670.,  87835.]],
'_AMT_em_ps_cpi': False,
'_AMT_prt': [ 0.275],
'_AMT_trt1': [ 0.286],
'_AMT_trt2': [ 0.022],
'_AMT_tthd': [ 204930.],
'_AMT_tthd_cpi': False,
'_CG_rt1': [ 0.],
'_CG_rt2': [ 0.165],
'_CG_rt3': [ 0.22],
'_CG_thd1': [[ 41415., 82830., 41415., 55440., 82830., 41415.]],
'_CG_thd1_cpi': False,
'_CG_thd2': [[ 456555. , 513645. , 256822.5, 485100. , 513645. , 256822.5]],
'_CG_thd2_cpi': False,
'_CTC_c': [ 1100.],
'_CTC_prt': [ 0.055],
'_CTC_ps': [[  82500., 121000.,  60500.,  82500.,  82500.,  60500.]],
'_EITC_c': [[  556.6, 3710.3, 6129.2, 6895.9]],
'_EITC_c_cpi': False,
'_EITC_prt': [[ 0.08415, 0.17578, 0.23166, 0.23166]],
'_EITC_ps': [[  9097., 20009., 20009., 20009.]],
'_EITC_ps_cpi': False,
'_EITC_rt': [[ 0.08415, 0.374  , 0.44   , 0.495  ]],
'_FICA_mc_trt': [ 0.0319],
'_FICA_ss_trt': [ 0.1364],
'_ID_BenefitSurtax_Switch': [[ 1., 1., 1., 1., 1., 1., 1.]],
'_ID_BenefitSurtax_crt': [ 1.],
'_ID_BenefitSurtax_trt': [ 1.],
'_ID_Casualty_frt': [ 0.11],
'_ID_Charity_crt_Asset': [ 0.3],
'_ID_Charity_crt_Cash': [ 0.5],
'_ID_Charity_frt': [ 0.],
'_ID_Medical_frt': [ 0.11],
'_ID_Miscellaneous_frt': [ 0.022],
'_ID_RealEstate_HC': [ 0.],
'_ID_StateLocalTax_HC': [ 0.],
'_ID_crt': [ 0.88],
'_ID_prt': [ 0.033],
'_ID_ps': [[ 285340., 342430., 171215., 313885., 342430., 171215.]],
'_ID_ps_cpi': False,
'_II_brk1': [[ 10202.5, 20405. , 10202.5, 14575. , 20405. , 10202.5]],
'_II_brk1_cpi': False,
'_II_brk2': [[ 41415., 82830., 41415., 55440., 82830., 41415.]],
'_II_brk2_cpi': False,
'_II_brk3': [[ 100265., 167090.,  83545., 143165., 167090.,  83545.]],
'_II_brk3_cpi': False,
'_II_brk4': [[ 209165. , 254595. , 127297.5, 231880. , 254595. , 127297.5]],
'_II_brk4_cpi': False,
'_II_brk5': [[ 454685. , 454685. , 227342.5, 454685. , 454685. , 227342.5]],
'_II_brk5_cpi': False,
'_II_brk6': [[ 456555. , 513645. , 256822.5, 485100. , 513645. , 256822.5]],
'_II_brk6_cpi': False,
'_II_credit': [[ 0., 0., 0., 0., 0., 0.]],
'_II_credit_prt': [ 0.],
'_II_credit_ps': [[ 0., 0., 0., 0., 0., 0.]],
'_II_em': [ 4455.],
'_II_em_cpi': False,
'_II_em_ps': [[ 285340., 342430., 171215., 313885., 342430., 171215.]],
'_II_em_ps_cpi': False,
'_II_prt': [ 0.022],
'_II_rt1': [ 0.11],
'_II_rt2': [ 0.165],
'_II_rt3': [ 0.275],
'_II_rt4': [ 0.308],
'_II_rt5': [ 0.363],
'_II_rt6': [ 0.385],
'_II_rt7': [ 0.4356],
'_NIIT_thd': [[ 220000., 275000., 137500., 220000., 275000., 137500.]],
'_NIIT_trt': [ 0.0418],
'_SS_Earnings_c': [ 130350.],
'_SS_Earnings_c_cpi': False,
'_SS_percentage1': [ 0.55],
'_SS_percentage2': [ 0.935],
'_SS_thd50': [[ 27500., 35200., 0., 27500., 27500., 0.]],
'_SS_thd85': [[ 37400., 48400., 0., 37400., 37400., 0.]],
'_STD': [[  6930., 13860., 6930., 10230., 13860., 6930., 1155.]],
'_STD_Aged': [[ 1705., 1375., 1375., 1705., 1705., 1375.]],
'_STD_Aged_cpi': False,
'_STD_cpi': False
}}

# run models using puf-taxbrain.csv data
tax_dta = pd.read_csv('../tax-calculator-data/puf-taxbrain.csv')
_, _, _, _, _, _, _, _, _, _, rev = dropq.run_models(tax_dta, num_years=nyrs,
                                                     start_year=first_year,
                                                     user_mods=reform)
billion = 1.0E-9
for idx in range(nyrs):
    year = first_year + idx
    amt = float(rev[idx]['ind_tax_{}'.format(idx)]) * billion
    print '{} ITAX {:.1f}'.format(year, amt)
for idx in range(nyrs):
    year = first_year + idx
    amt = float(rev[idx]['payroll_tax_{}'.format(idx)]) * billion
    print '{} FICA {:.1f}'.format(year, amt)

Here are the contents of the dropQ-test.results file produced by the above script:

2016 ITAX 74.1
2017 ITAX 91.4
2018 ITAX 118.1
2019 ITAX 146.5
2020 ITAX 177.2
2021 ITAX 211.9
2022 ITAX 249.3
2023 ITAX 289.7
2024 ITAX 333.3
2025 ITAX 380.4
2016 FICA 121.3
2017 FICA 117.8
2018 FICA 113.9
2019 FICA 109.3
2020 FICA 104.6
2021 FICA 99.3
2022 FICA 92.7
2023 FICA 84.7
2024 FICA 75.4
2025 FICA 64.7

And here are the contents of the taxbrain.results file, which was produced by TaxBrain:

2016 ITAX 74.1
2017 ITAX 91.3
2018 ITAX 117.9
2019 ITAX 146.2
2020 ITAX 176.8
2021 ITAX 211.4
2022 ITAX 248.7
2023 ITAX 288.9
2024 ITAX 332.4
2025 ITAX 379.3
2016 FICA 121.3
2017 FICA 117.8
2018 FICA 113.9
2019 FICA 109.3
2020 FICA 104.6
2021 FICA 99.3
2022 FICA 92.7
2023 FICA 84.7
2024 FICA 75.4
2025 FICA 64.7

These TaxBrain results differ from the dropQ results by exactly the same amounts (for example, -1.1 billion in 2025 for ITAX) as do the TaxBrain-vs-taxcalc results in taxcalc/taxbrain/all-16-lvlnoi.results file.

So, in conclusion, these results appear to be strong evidence that TaxBrain is, for some reason, not correctly preparing the reform dictionary from the user-specified webapp form. Is there some way to have TaxBrain dump out the reform dictionary it has prepared before using it to execute the run?

@talumbau @MattHJensen @Amy-Xu @feenberg @GoFroggyRun

talumbau commented 8 years ago

@martinholmer do you have a URL for a completed TaxBrain run that is producing the bad results? If you give me the URL in the form of www.ospc.org/taxbrain/NNNN, I can see what reform dictionary TaxBrain formed and compare it to the above to see what went wrong. Looks like we are getting closer!

martinholmer commented 8 years ago

T.J. said:

Do you have a URL for a completed TaxBrain run that is producing the bad results? If you give me the URL in the form of www.ospc.org/taxbrain/NNNN, I can see what reform dictionary TaxBrain formed and compare it to the above to see what went wrong. Looks like we are getting closer!

As I said, it is in the all-16-lvlnoi.results file on the Tax-Calculator master branch in the taxcalc/taxbrain directory. Here are the contents of that file:

STARTING WITH TAXBRAINTEST : Mon Mar 21 11:07:00 EDT 2016
1-000   ITAX    2017    -0.1    -0.11   www.ospc.org/taxbrain/2057/
1-000   ITAX    2018    -0.2    -0.17   www.ospc.org/taxbrain/2057/
1-000   ITAX    2019    -0.3    -0.20   www.ospc.org/taxbrain/2057/
1-000   ITAX    2020    -0.4    -0.23   www.ospc.org/taxbrain/2057/
1-000   ITAX    2021    -0.5    -0.24   www.ospc.org/taxbrain/2057/
1-000   ITAX    2022    -0.6    -0.24   www.ospc.org/taxbrain/2057/
1-000   ITAX    2023    -0.8    -0.28   www.ospc.org/taxbrain/2057/
1-000   ITAX    2024    -0.9    -0.27   www.ospc.org/taxbrain/2057/
1-000   ITAX    2025    -1.1    -0.29   www.ospc.org/taxbrain/2057/
FINISHED WITH TAXBRAINTEST : Mon Mar 21 11:10:06 EDT 2016

@talumbau @MattHJensen @feenberg @Amy-Xu @GoFroggyRun

martinholmer commented 8 years ago

T.J. said:

[using] www.ospc.org/taxbrain/NNNN/, I can see what reform dictionary TaxBrain formed.

How can a TaxBrain user do that? I want to be able to do that.

@talumbau @MattHJensen @feenberg @Amy-Xu @GoFroggyRun

talumbau commented 8 years ago

TaxBrain users can't currently do that. This feature is coming in the next week or so. Here is what I discovered from looking at the run:

For multi-valued parameters, TaxBrain can't do all of the reform values because it only has entries for these filing statuses:

Single Married filing Jointly Married filing Separately Head of Household

So for these parameters, the reform dictionary just uses the default data for the last two parameter entries. For example, for _AMT_em, here is what TaxBrain takes as the reform parameter:

[[59290, 92180, 46090, 59290, 83800.0, 41900.0]]

and here is what is specified in the reform dictionary in the test:

[[59290.0, 92180.0, 46090.0, 59290.0, 92180.0, 46090.0]]

The last two entries are different because there is no way to enter non-default values there. The parameters in the reform where this is an issue are:

_AMED_thd, _AMT_CG_thd1 _AMT_CG_thd2, _AMT_em, _AMT_em_ps, _CG_thd1, _CG_thd2, _CTC_ps, _ID_ps, _II_brk1, _II_brk2, _II_brk3, _II_brk4, _II_brk5, _II_brk6, _II_em_ps, _NIIT_thd,     _STD_Aged

TaxBrain truncates parameter entries to the nearest dollar. This was an overzealous application of the idea presented here:

https://github.com/open-source-economics/Tax-Calculator/issues/589#issuecomment-183045015

which is to round inflated default parameter values to the nearest dollar. However, the user should be able to enter non-whole dollar reform parameter values. This is a bug.

the TaxBrain entry for the parameter _ID_BenefitSurtax_Switch uses a list of Python bool values, whereas in the reform dictionary in the test, it is a list of float values (1.0):

[[True, True, True, True, True, True, True]] vs. [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]]

Probably makes no difference but I would re-run just to see.

The following values are not in the TaxBrain version of the reform dictionary, because they are equal to the defaults and so the browser does not capture them and give them to the backend for further processing:

_ACTC_ChildNum [3.0]
_ALD_Alimony_HC [0.0]
_ALD_EarlyWithdraw_HC [0.0]
_ALD_KEOGH_SEP_HC [0.0]
_ALD_SelfEmp_HealthIns_HC [0.0]
_ALD_SelfEmploymentTax_HC [0.0]
_ALD_StudentLoan_HC [0.0]
_AMT_CG_rt1 [0.0]
_AMT_CG_thd2_cpi False
_CG_rt1 [0.0]
_ID_BenefitSurtax_crt [1.0]
_ID_BenefitSurtax_trt [1.0]
_ID_Charity_crt_Asset [0.3]
_ID_Charity_crt_Cash [0.5]
_ID_Charity_frt [0.0]
_ID_RealEstate_HC [0.0]
_ID_StateLocalTax_HC [0.0]
_II_credit [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]
_II_credit_prt [0.0]
_II_credit_ps [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]

Lastly, the _AMT_CG_thd2_cpi flag is not propagated to the backend in TaxBrain. This is for sure a bug and I will look into it.

So, based on this examination, @martinholmer could you do as follows to see if this is the source of the problem? Re-run your test with the following changes to the reform dictionary:

For the multi-valued parameters I listed above, only specify values for the first four filing statuses This would match up with TaxBrain input.
Specify _ID_BenefitSurtax_Switch as all True instead of all 1.0?
Specify all reform entries in whole dollars
Remove _AMT_CG_thd2_cpi from the reform dictionary in the test.

If you do these 4 things and re-run your local test, I am hopeful you will get the same answer on TaxBrain and with running locally. Meanwhile, I will fix the two bugs (allow reform entries to be non-whole dollar amounts and fixing the CPI flag issue). Once those fixes are up, you can re-run the test only doing items 1 and 2 in my list and we should get the same answer.

martinholmer commented 8 years ago

T.J. said:

Re-run your test with the following changes to the reform dictionary:

(1) For the multi-valued parameters I listed above, only specify values for the first four filing statuses This would match up with TaxBrain input.

(2) Specify _ID_BenefitSurtax_Switch as all True instead of all 1.0?

(3) Specify all reform entries in whole dollars

(4) Remove _AMT_CG_thd2_cpi from the reform dictionary in the test.

If you do these 4 things and re-run your local test, I am hopeful you will get the same answer on TaxBrain and with running locally. Meanwhile, I will fix the two bugs (allow reform entries to be non-whole dollar amounts and fixing the CPI flag issue). Once those fixes are up, you can re-run the test only doing items 1 and 2 in my list and we should get the same answer.

I did only item (4) and the test ran perfectly --- that is, the TaxBrain webapp results generated in the cloud were exactly equal to the taxcalc package results generated on my local computer on the master branch.

So, make whatever improvements you like in the webapp-public repo (including the _AMT_CG_thd2_cpi bug fix). Then when the new version of TaxBrain is publicly available, I'll rerun the test without dropping the reform provision in item (4) and I should get perfect agreement between TaxBrain results and taxcalc results.

@talumbau @MattHJensen @feenberg @Amy-Xu @GoFroggyRun

talumbau commented 8 years ago

@martinholmer Looks like I spoke too soon. I was convinced that there was a problem in the interface with registering the _AMT_CG_thd2_cpi flag because I believed I was going to the interface, clicking the CPI button to "off", and then seeing that _AMT_CG_thd2_cpi was not set to "off" in the response back from the browser. However, I was mistaken. Actually, I was clicking the CPI button for _CG_thd2_cpi, which is a similar looking parameter in the Regular Taxes section, not the Alternative Minimum Tax section. When I click the CPI button for _AMT_CG_thd2_cpi I see the proper response come back in the interface. However, when I go to ospc.org/taxbrain/2057 and click "Edit Parameters" I see that the CPI button for _AMT_CG_thd2_cpi is not clicked off, so the default value of 'True' would be used. When I then click that CPI button to "off" and run, I get the result here:

http://www.ospc.org/taxbrain/2137/

which seems like the right answer (see, for example, Individual Income Tax for 2025). So, the conclusion that I have right now is that taxbrain #2057 was not run with the CPI button for _AMT_CG_thd2_cpi clicked to "off". Was this done with an automated tool? Could we reproduce it? At this point I don't have evidence that there is something wrong in the interface, so I'm looking for a way to reproduce the issue.

martinholmer commented 8 years ago

T.J. said:

When I click the CPI button for _AMT_CG_thd2_cpi I see the proper response come back in the interface. However, when I go to ospc.org/taxbrain/2057/ and click "Edit Parameters" I see that the CPI button for _AMT_CG_thd2_cpi is not clicked off, so the default value of 'True' would be used. [snip]

So, the conclusion that I have right now is that taxbrain #2057 was not run with the CPI button for _AMT_CG_thd2_cpi clicked to "off". Was this done with an automated tool? Could we reproduce it? At this point I don't have evidence that there is something wrong in the interface, so I'm looking for a way to reproduce the issue.

Two things, first an answer to your question and second another TaxBrain bug report.

(1) Yes, all the TaxBrain-vs-taxcalc testing being discussed here in issue #655 is being "done with an automated tool". You can see the nature of that automation in the Tax-Calculator repo in the taxcalc/taxbrain directory on the master branch. The discussion in this issue is about the all-16-lvlnoi.json reform and that reform's test results, which are generated by the reforms.py script and are in the all-16-lvlnoi.results file.

(2) It seems to me that your conclusion about TaxBrain run 2057 is incorrect because you have been bitten by yet another (until now unreported) TaxBrain bug. When I went back to www.ospc.org/taxbrain/2057/ and clicked on the Edit Parameters button, I see all the ten-percent-higher parameter values in the input boxes, but none of the 21 CPI-indexing icons involved in the reform are off (they are all incorrectly on). So, before your suggestion that the automated test is in error can have any credibility, you need to fix TaxBrain so it will remember a run's CPI-indexing status for each policy parameter. Right now TaxBrain seems to forget the reform-changed value of a parameter's CPI-indexing status.

@talumbau @MattHJensen @feenberg @Amy-Xu @GoFroggyRun

talumbau commented 8 years ago

Ah, yes I see. Actually the situation is perhaps even worse than it appears. By looking at the backend log entries, I see that when you click "Edit Parameters" and go back to the input page, and then submit again, all of the reform parameters you previously entered are "remembered" including any CPI flag settings. However, the input page does not reflect this. So, for example, if you view taxbrain/2057 and click Edit Parameters, and then re-submit, the _II_brk4_cpi flag is still set to False, as it is with taxbrain/2057, but the input page appears to have the flag set to 'True', which is the default. So, the fix would be to have the page reflect the underlying state. I will take this on.

talumbau commented 8 years ago

I can confirm that running all-16-lvlnoi.json with reform.py does not result in generating a TaxBrain simulation where AMT_CG_thd2_cpi is clicked to False. This is impossible for the regular user to see because I am watching the logs that are generated immediately after the job is submitted. This was quite surprising to me, as it does call the taxbrain_cpi_button_click function for that parameter. When I modified all-16-lvlnoi.json to only have parameters for CPI settings, all the CPI flags ended up in the submission. I tried to adjust the poll time for the WebDriverWait class, but that didn't fix things. So, I would say we can not proceed until TaxBrain has one (or both) of these two features:

the proper setting of CPI flags for previously completed runs when one clicks "Edit Parameters"
the ability to download or at least display the generated reform dictionary used for a run on the results page

If either or both of those were available, you would see that your desired reform was not making it in to TaxBrain. So, I would suggest we pause the comparisons of TaxBrain to taxcalc until one or both of these features is available.

cc @MattHJensen

martinholmer commented 8 years ago

T.J. said:

So, I would say we can not proceed until TaxBrain has one (or both) of these two features:

(1) the proper setting of CPI flags for previously completed runs when one clicks "Edit Parameters"

(2) the ability to download or at least display the generated reform dictionary used for a run on the results page

If either or both of those were available, you would see that your desired reform was not making it in to TaxBrain. So, I would suggest we pause the comparisons of TaxBrain to taxcalc until one or both of these features is available.

So, when will you have these features available as part of the publicly-available TaxBrain webapp?

@talumbau @MattHJensen @feenberg @Amy-Xu @GoFroggyRun

martinholmer commented 8 years ago

This issue has been successfully resolved by Tax-Calculator pull request #662 (which works around a shortcoming in the selenium package) and pull request #663 (which updates taxcalc/taxbrain results).

talumbau commented 8 years ago

@martinholmer the TaxBrain fix is now live on the site (that is, edited CPI flags on microsims are correct on the Edit Parmeters page)

martinholmer commented 8 years ago

@talumbau said:

The TaxBrain fix is now live on the site (that is, edited CPI flags on microsims are correct on the Edit Parmeters page).

T.J., Thanks for all the help on this!

martinholmer commented 8 years ago

@talumbau said on Saturday:

The TaxBrain fix is now live on the site (that is, edited CPI flags on microsims are correct on the Edit Parmeters page).

Then I said:

T.J., Thanks for all the help on this!

But then on Monday, I get this:

Distribution and Revenue Tables for Federal Individual Income Taxes

These results were generated on Mon, Mar 28th 2016 at 2:42PM using version 0.6.2.80ff43 TaxBrain. (ID: 10045)

Aren't we supposed to be at version 0.6.3? Version 0.6.2 was the one that didn't work correctly with the CPI parameters.

talumbau commented 8 years ago

The problem was on the site itself, not taxcalc. So the fix did not involve updating taxcalc. Oh, now that I read the message closely, it is associating the version number for taxcalc with "TaxBrain" and not taxcalc itself. I think we ended up with that language so that the casual user would not be confused (so they only have to think of the site as one thing, TaxBrain, instead of two things, TaxBrain and taxcalc). But in the strictest sense, this message is not accurate, because TaxBrain changed (through modification of the Django app) but taxcalc did not.