PSLmodels / Business-Taxation

USA Corporate and Pass-Through Business Tax Model
11 stars 10 forks source link

Add tests to new Business-Taxation/biztax/tests directory #61

Closed martinholmer closed 5 years ago

martinholmer commented 5 years ago

The objective of this pull request is to translate the tests included in the Business-Taxation/test_brc.py file into pytest tests located in the Business-Taxation/biztax/tests directory. So the changes in this pull request are non-substantive in that they do not change business tax logic. The pull request does some minor file name changes (in order to enhance clarity for newcomers) when copying the six asset-related expected results files in the Business-Taxation/test_results directory to the Business-Taxation/biztax/tests directory. For example, the asset2_1.csv file was renamed asset_ref2_corp_expect.csv when it was copied.

When doing this initial phase of the work to create a comprehensive test suite, I expected all the tests to pass.
However, one of the six asset tests fails. I've double-checked reform2, but for some reason beginning in 2017 the amount of taxDep is somewhat less in my test results than in the asset_ref2_nonc_expect.csv file.

NOTE: this test failure has been resolved as described in this later comment.

Here is what the test results look like and then after that I show the actual-versus-expected differences.

iMac:Business-Taxation mrh$ make pytest
============================= test session starts ==============================
platform darwin -- Python 3.6.8, pytest-4.3.0, py-1.7.0, pluggy-0.8.1
rootdir: /Users/mrh/work/PSL/Business-Taxation, inifile:
plugins: xdist-1.26.1, remotedata-0.3.1, openfiles-0.3.2, forked-0.2, doctestplus-0.1.3, arraydiff-0.2
gw0 [8] / gw1 [8] / gw2 [8] / gw3 [8]
.......F                                                                 [100%]
=================================== FAILURES ===================================
_______________________ test_asset_capital_path[2-False] _______________________
[gw3] darwin -- Python 3.6.8 /Users/mrh/anaconda3/bin/python

reform_num = 2, corporate = False
reform = {0:     year  tau_c  ...  newIntPaid_noncorp_hc  newIntPaid_noncorp_hcyear
0   2014  0.347  ...                      0...                   2017
13  2027  0.347  ...                      1                       2017

[14 rows x 41 columns]}
actual_vs_expect = <function actual_vs_expect.<locals>.act_vs_exp at 0x11b78f488>

    @pytest.mark.parametrize('reform_num, corporate',
                             [(0, True),
                              (0, False),
                              (1, True),
                              (1, False),
                              (2, True),
                              (2, False)])
    def test_asset_capital_path(reform_num, corporate,
                                reform, actual_vs_expect):
        """
        Test corp/non-corp capital_path results under different reforms.
        """
        asset = Asset(reform[reform_num], corp=corporate)
        asset.calc_all()
        decimals = 2
        capital_path = copy.deepcopy(asset.capital_path).round(decimals)
        fname = 'asset_ref{}_{}_expect.csv'.format(reform_num,
                                                   'corp' if corporate else 'nonc')
>       actual_vs_expect(capital_path, fname, precision=decimals)

tests/test_asset.py:32: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

act_df =     year    Kstock  Investment  FixedInv  trueDep  taxDep   FixedK
0   2014   6743.52      306.60    271.56   159.49  ...6.04      448.35    427.68   297.19  378.15  4743.82
13  2027  10543.02      466.18    444.69   310.31  393.77  4971.36
exp_csv_filename = 'asset_ref2_nonc_expect.csv', precision = 2

    def act_vs_exp(act_df, exp_csv_filename, precision=0):
        """
        The function returned by the actual_vs_expect fixture.
        """
        tests_path = os.path.abspath(os.path.dirname(__file__))
        exp_path = os.path.join(tests_path, exp_csv_filename)
        exp_df = pd.read_csv(exp_path, index_col=False)
        assert list(act_df.columns.values) == list(exp_df.columns.values)
        diffs = False
        for icol in act_df.columns.values:
            if not numpy.allclose(act_df[icol], exp_df[icol]):
                diffs = True
        if diffs:
            act_csv_filename = '{}{}'.format(exp_path[:-10], 'actual.csv')
            frmt = '%.{}f'.format(precision)
            act_df.to_csv(act_csv_filename, float_format=frmt, index=False)
            act_fname = os.path.basename(act_csv_filename)
            exp_fname = exp_csv_filename
            msg = '\n'
            msg += 'NEW RESULTS IN FILE {}\n'.format(act_fname)
            msg += 'DIFFER FROM IN FILE {}\n'.format(exp_fname)
            msg += 'If diffs are OK, move ACTUAL file to EXPECT file and '
            msg += 'rerun test.'
>           raise ValueError(msg)
E           ValueError: 
E           NEW RESULTS IN FILE asset_ref2_nonc_actual.csv
E           DIFFER FROM IN FILE asset_ref2_nonc_expect.csv
E           If diffs are OK, move ACTUAL file to EXPECT file and rerun test.

tests/conftest.py:128: ValueError
====================== 1 failed, 7 passed in 9.80 seconds ======================
make: *** [pytest] Error 1

And now the actual test results differences shown in a graphical diff tool (with the actual results generated on my computer on the left and the expected results from the GitHub repository on the right):

screen shot 2019-03-07 at 6 04 10 pm

@codykallen, Can you see anything I've done wrong in translating the asset tests?

martinholmer commented 5 years ago

The likely cause of the test failure reported in this comment is the following code:

class Investor():
    """
    Constructor for the Investor class.

    This class provides information on effective marginal tax rates on 
    investment income and distributes changes in corporate after-tax income
    and pass-through business net income.

    Parameters:
        refdict: reform dict for the Calculator

    """

    def __init__(self, refdict):
        if refdict is not None:
            if isinstance(refdict, dict):
                self.refdict = refdict
            else:
                raise ValueError('refdict must be a dictionary')
        else:
            self.refdict = {}
        # Specify path to PUF
        self.records_url = 'puf.csv'                    <---------------------------------
        # MTRs needed for calculating tax rates on business equity
        self.needed_mtr_list = ['e00900p', 'e26270', 'e02000', 'e01700',
                                'e00650', 'p22250', 'p23250']

The marked statement will work only when the code is executed in the top-level Business-Taxation directory.

I'd be willing to prepare a separate pull request that makes a few changes to generalize the logic of reading in the investor data from the puf.csv file. @codykallen, would you be willing to consider a few changes along those lines?

martinholmer commented 5 years ago

The idea about what is causing the test failure outlined in this comment seems dubious because the test failure still occurs when a copy of the puf.csv file is placed in the biztax directory and in the biztax/tests directory. @codykallen, do you have any thoughts?

martinholmer commented 5 years ago

After commit 9b46391 all the non-empty tests in the Business-Taxation/test_brc.py file have been translated into tests in the Business-Taxation/biztax/tests directory. And all 27 of the files in the Business-Taxation/test_results directory that are associated with Business-Taxation/test_brc.py have been copied (with clarifying name changes) to the Business-Taxation/biztax/tests directory.

All but one of the new tests pass. See this comment for the nature of the one test failure.

martinholmer commented 5 years ago

After commits 9af1ec9 and ee7012b in pull request #61, the code currently in Business-Taxation/example.py has been transferred to the new biztax/tests/test_bm_corp.py test file. That new test fails because the Investor class tries to read data from the puf.csv file, which does not exist in the tests directory. Then it tries to look in the taxcalc package, but that fails because the puf.csv file is not included in the taxcalc package for privacy reasons. The details of the error trace are at the end of this comment.

What is the solution? Only two small changes:

(1) It would seem that the BusinessModel class constructor needs another argument called investor_data that could be a string (in fact, its default value would be the string puf.csv) or could be a Pandas DataFrame containing the investor data. This would enable passing in any kind of investor data and in the pytest suite passing in the fixture puf_fullsample, which is defined in the biztax/tests/conftest.py file.

(2) It would seem that the Investor class constructor needs another argument called data that could be a string (in fact, its default value would be the string puf.csv) or could be a Pandas DataFrame containing the investor data. This would allow the BusinessModel class to pass an investor_data` DataFrame object to the Investor class constructor as an option. But the default value means the Investor class would by default work exactly like it does now.

The next commit in this pull request will make these two changes with the expectation that the test failure will turn into a test pass.

Here are the details of the error that is being generated by the current BusinessModel and Investor class code:

    def test_bm_corp(actual_vs_expect):
        """
        Test BusinessModel corporate results under a corporate-income-tax reform.
        """
        # specify corporate-income-tax reform dictionary with these provisions:
        # - apply a 28% corporate tax rate
        # - eliminate bonus depreciation
        # - establish 50% haircut on the deductibility of interest on new debt
        citax_refdict = {
            2018: {
                'tau_c': 0.28,
                'depr_3yr_bonus': 0.0,
                'depr_5yr_bonus': 0.0,
                'depr_7yr_bonus': 0.0,
                'depr_10yr_bonus': 0.0,
                'depr_15yr_bonus': 0.0,
                'depr_20yr_bonus': 0.0,
                'depr_25yr_bonus': 0.0,
                'depr_275yr_bonus': 0.0,
                'depr_39yr_bonus': 0.0,
                'pymtc_status': 1,
                'newIntPaid_corp_hc': 1.0,
                'newIntPaid_corp_hcyear': 2018,
                'oldIntPaid_corp_hc': 1.0,
                'oldIntPaid_corp_hcyear': 2018,
                'newIntPaid_noncorp_hc': 1.0,
                'newIntPaid_noncorp_hcyear': 2018,
                'oldIntPaid_noncorp_hc': 1.0,
                'oldIntPaid_noncorp_hcyear': 2018
            }
        }
        # specify individual-income-tax reform dictionary with no reform provisions
        iitax_refdict = {}
        # calculate results assuming no responses
        bizmod = BusinessModel(citax_refdict, iitax_refdict)
>       bizmod.calc_noresponse()

tests/test_bm_corp.py:45: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
businessmodel.py:149: in calc_noresponse
    self.investor_ref.distribute_results(self.multipliers)
investor.py:175: in distribute_results
    calc1 = self.initiate_calculator()
investor.py:39: in initiate_calculator
    records1 = Records(self.records_url)
/Users/mrh/anaconda3/lib/python3.6/site-packages/taxcalc/records.py:119: in __init__
    self._read_data(data, exact_calculations)
/Users/mrh/anaconda3/lib/python3.6/site-packages/taxcalc/records.py:450: in _read_data
    taxdf = read_egg_csv(data)  # pragma: no cover
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

fname = 'puf.csv', index_col = None

    def read_egg_csv(fname, index_col=None):
        """
        Read from egg the file named fname that contains CSV data and
        return pandas DataFrame containing the data.
        """
        try:
            path_in_egg = os.path.join('taxcalc', fname)
            vdf = pd.read_csv(
                pkg_resources.resource_stream(
                    pkg_resources.Requirement.parse('taxcalc'),
                    path_in_egg),
                index_col=index_col
            )
        except Exception:
>           raise ValueError('could not read {} data from egg'.format(fname))
E           ValueError: could not read puf.csv data from egg
codykallen commented 5 years ago

@martinholmer, I've found the problem to the original test failure. It's an error in the original test, not in the new one. If you look in the test_asset2() function in test_brc.py, it reads

    asset1 = Asset(BM2.btax_params_ref)
    asset1.calc_all()
    path1 = copy.deepcopy(asset1.capital_path).round(2)
    # Noncorporate
    asset2 = Asset(BM1.btax_params_ref, corp=False)
    asset2.calc_all()
    path2 = copy.deepcopy(asset2.capital_path).round(2)

This calls the test for the noncorporate asset using the wrong reform dictionary (the one from test_asset1).

martinholmer commented 5 years ago

@codykallen said in this PR #61 comment:

I've found the problem to the original test failure. It's an error in the original test, not in the new one. If you look in the test_asset2() function in test_brc.py, it reads

   asset1 = Asset(BM2.btax_params_ref)
   asset1.calc_all()
   path1 = copy.deepcopy(asset1.capital_path).round(2)
   # Noncorporate
   asset2 = Asset(BM1.btax_params_ref, corp=False)
   asset2.calc_all()
   path2 = copy.deepcopy(asset2.capital_path).round(2)

This calls the test for the noncorporate asset using the wrong reform dictionary (the one from test_asset1).

Thanks for catching this! I'll fix the typo in test_brc.py as part of pull request #61.

martinholmer commented 5 years ago

Commit 168bb76 in pull request #61 fixes the typo in Business-Taxation/test_brc.py identified by @codykallen in this comment. Now the only remaining test failure is the one reported in this comment.

martinholmer commented 5 years ago

After commit 3f15811 in pull request #61, all the tests pass and the Business-Taxation/test_brc.py tests all pass without changing any code in the test_brc.py file. Here are the details:

iMac:Business-Taxation mrh$ cd biztax/ ; pytest -n4 ; cd ..
============================= test session starts ==============================
platform darwin -- Python 3.6.8, pytest-4.3.0, py-1.7.0, pluggy-0.8.1
rootdir: /Users/mrh/work/PSL/Business-Taxation, inifile:
plugins: xdist-1.26.1, remotedata-0.3.1, openfiles-0.3.2, forked-0.2, doctestplus-0.1.3, arraydiff-0.2
gw0 [30] / gw1 [30] / gw2 [30] / gw3 [30]
..............................                                           [100%]
========================== 30 passed in 62.96 seconds ==========================
iMac:Business-Taxation mrh$ python test_brc.py 
Test Asset 0: PASS
Test Asset 1: PASS
Test Asset 2: PASS
Test Debt 0: PASS
Test Debt 1: PASS
Test Debt 2: PASS
Test Corporation 0: PASS
Test Corporation 1: PASS
Test Corporation 2: PASS
Test PassThrough 0: PASS
Test PassThrough 1: PASS
Test PassThrough 2: PASS
iMac:Business-Taxation mrh$ 
martinholmer commented 5 years ago

@codykallen, should I expand the new test_bm_corp.py test to generate results with responses in this pull request?

codykallen commented 5 years ago

@martinholmer asked:

should I expand the new test_bm_corp.py test to generate results with responses in this pull request?

I have avoided writing tests for the responses for 2 reasons.

  1. I'm not satisfied with the legal response.
  2. The responses rely on weighted average effective marginal tax rates calculated using taxcalc. Therefore, these results will change whenever taxcalc changes or we update to a new PUF.
martinholmer commented 5 years ago

@codykallen, The changes in pull request #61 are ready for review.

codykallen commented 5 years ago

@martinholmer, it looks like you've added some code to allow for testing with responses in the future. Does that functionality currently work? As in, would running the tests with with_responses set to both true and false (in the list) repeat all of the tests with and without responses? If not, which tests would be repeated?

martinholmer commented 5 years ago

@codykallen said in pull request #61:

I have avoided writing tests for the responses for 2 reasons:

  1. I'm not satisfied with the legal response [logic].
  2. The responses rely on weighted average effective marginal tax rates calculated using taxcalc; therefore, these results will change whenever taxcalc changes or we update to a new puf.csv file.

These are good reasons not to add right now a test that uses non-zero response elasticities. But, on the other hand, the goal of the tests is complete code coverage (that is, testing every statement in the biztax files).

I think we can respect your two points and add right now a special with_responses test that will increase code coverage. This additional test would specify all the response elasticities to be zero, call the BusinessModel.update_elasticities(elasticities) method and the BusinessModel.calc_withresponse() method, and then check that the results are identical to the results when with_responses is False. This is the same as tests in other PSL repositories where we check that dynamic analysis with zero elasticities produces the same results as static analysis.

I'm assuming that calling update_elasticities({}) will imply that all the elasticities are zero, right?

@codykallen also asked:

would running the tests with with_responses set to both True and False (in the parametrize list) repeat all of the tests with and without responses? If not, which tests would be repeated?

No. The only test that would be repeated is the test with the with_responses parametrize list. I'll add a commit to PR #61 that illustrates the points I'm making in this comment.

martinholmer commented 5 years ago

After commit 02d4efd, there are 31 tests, all of which pass except for the new one, which does not fail but rather generates a Python error (details below).

@codykallen, if you consider the premise of the new test correct, then you can merge this pull request and undertake some code revisions that eliminate the error. Does that make sense?

Here is what happens on my computer:

iMac:Business-Taxation mrh$ cd biztax ; pytest -n4 ; cd ..
============================= test session starts ==============================
platform darwin -- Python 3.6.8, pytest-4.3.0, py-1.7.0, pluggy-0.8.1
rootdir: /Users/mrh/work/PSL/Business-Taxation, inifile:
plugins: xdist-1.26.1, remotedata-0.3.1, openfiles-0.3.2, forked-0.2, doctestplus-0.1.3, arraydiff-0.2
gw0 [31] / gw1 [31] / gw2 [31] / gw3 [31]
.............................F.                                          [100%]
=================================== FAILURES ===================================
_____________________________ test_bm_corp0[True] ______________________________
[gw0] darwin -- Python 3.6.8 /Users/mrh/anaconda3/bin/python

with_response = True
actual_vs_expect = <function actual_vs_expect.<locals>.act_vs_exp at 0x1158797b8>
puf_fullsample =         elderly_dependents  age_head  age_spouse  ...  g20500  pencon_p  pencon_s
0                        0        18...        0
248590                   0        85           0  ...       0         0         0

[248591 rows x 89 columns]
tests_path = '/Users/mrh/work/PSL/Business-Taxation/biztax/tests'

    @pytest.mark.parametrize('with_response', [(False), (True)])
    def test_bm_corp0(with_response, actual_vs_expect,
                      puf_fullsample, tests_path):
        """
        Test BusinessModel corporate results under a corporate-income-tax reform
        using calc_norespone() and calc_withresponse() with zero elasticities,
        checking that the two sets of results are the same.
        """
        # ensure that expected results in the two with_response cases are the same
        assert filecmp.cmp(os.path.join(tests_path,
                                        'bm_corp0_base_nresp_expect.csv'),
                           os.path.join(tests_path,
                                        'bm_corp0_base_wresp_expect.csv'),
                           shallow=False)
        assert filecmp.cmp(os.path.join(tests_path,
                                        'bm_corp0_refm_nresp_expect.csv'),
                           os.path.join(tests_path,
                                        'bm_corp0_refm_wresp_expect.csv'),
                           shallow=False)
        # specify corporate-income-tax reform dictionary with these provisions:
        # - apply a 28% corporate tax rate
        # - eliminate bonus depreciation
        # - establish 50% haircut on the deductibility of interest on new debt
        citax_refdict = {
            2018: {
                'tau_c': 0.28,
                'depr_3yr_bonus': 0.0,
                'depr_5yr_bonus': 0.0,
                'depr_7yr_bonus': 0.0,
                'depr_10yr_bonus': 0.0,
                'depr_15yr_bonus': 0.0,
                'depr_20yr_bonus': 0.0,
                'depr_25yr_bonus': 0.0,
                'depr_275yr_bonus': 0.0,
                'depr_39yr_bonus': 0.0,
                'pymtc_status': 1,
                'newIntPaid_corp_hc': 1.0,
                'newIntPaid_corp_hcyear': 2018,
                'oldIntPaid_corp_hc': 1.0,
                'oldIntPaid_corp_hcyear': 2018,
                'newIntPaid_noncorp_hc': 1.0,
                'newIntPaid_noncorp_hcyear': 2018,
                'oldIntPaid_noncorp_hc': 1.0,
                'oldIntPaid_noncorp_hcyear': 2018
            }
        }
        # specify individual-income-tax reform dictionary with no reform provisions
        iitax_refdict = {}
        bizmod = BusinessModel(citax_refdict, iitax_refdict,
                               investor_data=puf_fullsample)
        # calculate results in different ways depending on value of with_response
        if with_response:
            bizmod.update_elasticities({})  # all elasticities are zero
>           bizmod.calc_withresponse()

tests/test_bm_corp.py:68: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
businessmodel.py:215: in calc_withresponse
    self.corp_ref.apply_responses(self.response)
corporation.py:148: in apply_responses
    self.update_legal(responses)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <biztax.corporation.Corporation object at 0x102cf7cc0>
responses = <biztax.response.Response object at 0x115abc438>

    def update_legal(self, responses):
        """
        Updates the rescale_corp and rescale_noncorp associated with each
        Data associated with each object.
        """
        self.data.update_rescaling(responses.rescale_corp, responses.rescale_noncorp)
>       self.asset.data.update_rescaling(responses.rescale_corp, responses.rescale_noncorp)
E       AttributeError: 'Corporation' object has no attribute 'asset'

corporation.py:103: AttributeError
==================== 1 failed, 30 passed in 248.25 seconds =====================
codykallen commented 5 years ago

@martinholmer described a test using zero elasticities and checking that it doesn't change the results. Such a test seems good to me.

@martinholmer then encountered an AttributeError, which found that a Corporation object has no asset attribute. This is because the creation of the Corporation's Asset, Debt and CorpTaxReturn objects occur when Corporation.calc_all() is called.

To avoid this error, we can either change when the Asset, Debt, and similar objects and created and calculated in the Corporation and PassThrough classes, or we change test_bm_corp.py lines 66-70 to say

bizmod.calc_noresponse()
if with_response:
    bizmod.calc_withresponse()

@martinholmer also asked:

I'm assuming that calling update_elasticities({}) will imply that all the elasticities are zero, right?

That should be the case, but it wouldn't be necessary to call that in the first place. If you don't call update_elasticities(), then it uses the default elasticities of zero.

martinholmer commented 5 years ago

@codykallen said in pull request #61:

To avoid this error, we can either change when the Asset, Debt, and similar objects are created and calculated in the Corporation and PassThrough classes, or we change test_bm_corp.py lines 66-70 to say:

bizmod.calc_noresponse()
if with_response:
bizmod.calc_withresponse()

I think the first or your two approaches is the superior one for at least two reasons. First, most (all?) users would expect to be able to call the BusinessModel.calc_withresponse() method directly (that is, without first calling the BusinessModel.calc_noresponse() method). And second, the second approach unnecessarily increases model run times.

If you merge this pull request as it now stands, you can then prepare another pull request that implements your first approach. Does that make sense?

@codykallen also answered:

I'm assuming that calling update_elasticities({}) will imply that all the elasticities are zero, right?

That should be the case, but it wouldn't be necessary to call that in the first place. If you don't call update_elasticities(), then it uses the default elasticities of zero.

OK, I understand. But calling it with an empty dictionary makes the point of the test more clear. So I'm in favor of leaving in the redundant statement for pedagogical reasons.

codykallen commented 5 years ago

@martinholmer suggested:

If you merge this pull request as it now stands, you can then prepare another pull request that implements your first approach.

I agree with this approach. Are there any other changes you would like to make before merging?

martinholmer commented 5 years ago

@codykallen said in pull request #61:

I agree with this approach.

Good.

Are there any other changes you would like to make before merging?

No. If you're comfortable with the changes, go right ahead and merge them into the master branch.