PSLmodels / Tax-Calculator

USA Federal Individual Income and Payroll Tax Microsimulation Model
https://taxcalc.pslmodels.org
Other
263 stars 157 forks source link

Proposed redesign of the Parameters class API #336

Closed martinholmer closed 9 years ago

martinholmer commented 9 years ago

Background

The classes in taxcalc have been developed initially to support the kind of tax policy analysis provided by the web application TaxBrain. The considerable power of these classes have been packaged in an API (that is, a set of public class methods) that supports the kind of ten-budget-year revenue estimation and distributional analysis produced by the TaxBrain app. This is a considerable accomplishment that has attracted interest in using the taxcalc capabilities in other ways. One of the new ways of using the taxcalc capabilities is described in issue #291, which calls for the development of a TAXSIM-like capability. See issue #319 for a broader discussion of other new ways of using the taxcalc classes.

The development of these other ways of using taxcalc classes is likely to require additions, and possibly revisions, in the current taxcalc API. This is part of what any successful project goes through as it gains popularity: the original API that was focused on the original use of the classes needs to be generalized to support a wider variety of uses. This process often causes tensions between the newcomers who want a broader API to support their new applications, and the veterans who designed the original API and have a responsibility to ensure that any changes to the API do not undermine its original use.

Some API requirements for a TAXSIM-like capability

Users of this new taxcalc application will specify a text file of income tax filing units (one unit per row that includes a unit id and the unit's tax year) and get in return a text output file containing a row for each input unit that includes the unit's tax liability and intermediate results such as adjusted gross income and taxable income. Users of this new application will have the option to specify a policy reform file containing JSON (if no policy reform file is specified, then the application uses current-law policy parameters from the params.json file) and the option to specify an inflation rate file containing JSON (if no inflation rate file is specified, then the application uses the default inflation rates specified in the taxcalc Parameters class.

To clarify the simple input and output style of the initial version of this TAXSIM-like capability, consider an example of the 22-variable input file that looks like this:

11 2013 0 1 0 0 58000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
37 2015 0 3 1 1 46000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21 2014 0 1 0 0 18000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 2013 0 2 0 1 49000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27 2015 0 3 2 0 32000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

This input file would produce a 29-variable output file like this (except that we would probably add a zero before every .00 value):

11. 2013 0 7928.75 .00 8874.00 25.00 .00 15.30 58000.00 .00 .00 6100.00 3900.00 .00 .00 .00 48000.00 7928.75 .00 .00 .00 .00 .00 .00 58000.00 .00 7928.75 8874.00
37. 2015 0 3422.50 .00 7038.00 15.00 .00 15.30 46000.00 .00 .00 10800.00 8000.00 .00 .00 .00 27200.00 3422.50 .00 .00 .00 .00 .00 .00 46000.00 .00 3422.50 7038.00
21. 2014 0 785.00 .00 2754.00 10.00 .00 15.30 18000.00 .00 .00 6200.00 3950.00 .00 .00 .00 7850.00 785.00 .00 .00 .00 .00 .00 .00 18000.00 .00 785.00 2754.00
19. 2013 0 3277.50 .00 7497.00 15.00 .00 15.30 49000.00 .00 .00 13400.00 7800.00 .00 .00 .00 27800.00 3277.50 .00 .00 .00 .00 .00 .00 49000.00 .00 3277.50 7497.00
27. 2015 0 -1488.80 .00 4896.00 31.06 .00 15.30 32000.00 .00 .00 9250.00 12000.00 .00 .00 .00 10750.00 1075.00 .00 .00 .00 .00 .00 2563.80 32000.00 .00 1075.00 4896.00

The definitions of the 22 input variables and the 29 non-state output variables are available at the NBER Internet TAXSIM website.

Even this brief description highlights a number of requirements that are not supported in the current taxcalc API. First of all, the filing units need to be somehow included in a Records class object without being subject to any weighting or "blowup" (that is, each unit's input variables must be left unchanged). This should be easy to do, but is likely to cause a redesign of the taxcalc API. Second, the Parameters object containing the user-specified policy must be able to be moved from any year to any other year in response to the next tax unit's tax year (rather than being moved sequentially through the budget years as in TaxBrain), and the Calculator object needs to be able to handle this kind of operation of the Parameters object. Third, because users will be specifying policy reforms and inflation rates, the Parameters class will probably need more checking in how it is being used. A little more thought will produce other issues, but hopefully this short list provides an understanding of why the taxcalc API will need to be redesigned to support these new applications and at the same time continue to support the TaxBrain application.

Proposed redesign of the Parameters class API

In order to support a new TAXSIM-like application, the following redesign of the public methods of the Parameters class is proposed. The goal of this redesign is to continue to support the needs of the TaxBrain application with (hopefully) minimal code changes, and at the same time, support the needs of newer applications. This proposed redesign description indicates which methods are being revised, dropped, or added to the Parameters API. The description also seeks to describe how existing code can be revised to get the same functionality as provided by the current version of the Parameters class.

This is being published as a GitHub Issue so that everybody on the OSPC team can make suggestions about how to make the new Parameters class better. In particular, I need to hear those who think this redesign will not support part of the existing code base. This proposed redesign will be modified in response to your suggestions and concerns.

Here is the initial version of the new Parameters class:

*** Parameters.__init__(parameter_dict=None,
                        start_year=2013,
                        num_years=12,
                        inflation_rates=None)

  parameter_dict : None implies read default parameters from params.json;
                   !None implies assign parameters from specified dictionary.

  start_year : as now (default value less when more historical data added)

  num_years : as now except rename to correct misleading "budget_year".

  inflation_rates : None implies use default inflation rates;
                    !None implies assign rates from specified dictionary.

  NOTE: drop constant inflation_rate parameter because it is rarely used
        and an inflation_rates dictionary is easy to make.

  NOTE: drop data parameter because it appears to never be used in the
        code or in the tests.

*** NOTE: drop from_file() function because its logic is now in __init__().

*** Parameters.default_meta_data(start_year)

  NOTE: add new static class function to replace the global default_data()
        function in the parameters.py file.  Using this new method like this
        Parameters.default_meta_data(2015) would return exactly the same
        dictionary as does default_data(metadata=True, start_year=2015)
        in the current version of the parameters.py file.

*** Parameters.default_inflation_rate()

    Returns dictionary of default inflation rates.

*** Parameters.end_year property

  NOTE: add new property that is set in __init__() and used in set_year().

*** Parameters.set_year(year)

  NOTE: same as in current version except for two internal changes:
        (a) checks that specified year is in [start_year, end_year] range;
        (b) updates current_year as follows: self._current_year = year

*** Parameters._update(year_mods)

  NOTE: method signature is the same as now, but method code is
        simplified, and method is made private because its only role
        now is helping the new public implement_reform method.

*** Parameters.implement_reform(reform)

  reform : dictionary contains one or more YEAR:MODS pairs, where
           the MODS describe reform provisions that are implemented
           in YEAR.

  NOTE: add this new method that would allow a one-time implementation
        of a multi-year reform in policy parameters.  This new method
        leaves the post-reform parameters object with current_year
        equal to start_year, but all the reforms over the [start_year,
        end_year] range will have been applied to the object.

  NOTE: the code for this new method will look something like this:

        if not reform:
            return # no reform to implement
        reform_years_list = reform.keys()
        last_reform_year = max(reform_years_list)
        if last_reform_year > self.end_year:
            ValueError('reform provision in year > Parameters.end_year')
        while self.current_year < last_reform_year:
            self.set_year(self.current_year + 1)
            if self.current_year in reform_years_list:
                year_mods = {self.current_year: reform[self.current_year]}
                self._update(year_mods)
        self.set_year(self.start_year)

*** NOTE: drop increment() method in favor of following usage (where ppo
        is a policy Parameters object): ppo.set_year(ppo.current_year + 1)

*** Parameters.num_years property

  NOTE: rename property from budget_years to new correct name.

*** NOTE: all other Parameter class properties are unchanged.

*** NOTE: drop global default_data() function from the parameters.py
          file because it is no longer needed because logic has been
          moved to the __init__() and default_meta_data() methods.
feenberg commented 9 years ago

On Wed, 5 Aug 2015, Martin Holmer wrote:

Background

ways. One of the new ways of using the taxcalc capabilities is described in issue #291, which calls for the development of a TAXSIM-like capability. See issue #319 for a broader discussion of other new ways of using the taxcalc classes.

I can certainly see the attraction of allowing a user to supply a dataset and get a tax calculation - that is what taxsim does. But taxsim does this for visitors who need after-tax prices and incomes for their regressions. If they want an accurate 10-year revenue forecast they will want to use the OSPC provided data. In fact, I have always thought that one of the primary benefits of the OSPC calculator was that it came with data for out years. If the visitor has CEX or PSID or RHS data that won't provide a realistic estimate for revenue, even as it provides valid data for learning the effects on individual taxpayers of tax parameters in the past, or forecasting the revenue effect of tweaking a tax parameter. So I don't really endorse spending a great deal of effort on providing the full range of facilities contemplated here. Perhaps something less elaborate might be desirable, but surely we should have experience with public use before we go in this direction.

...

Some API requirements for a TAXSIM-like capability

Users of this new taxcalc application will specify a text file of income tax filing units (one unit per row that includes a unit id and the unit's tax year) and get in return a text output file containing a row for each input unit that includes the unit's tax liability and intermediate results such as adjusted gross income and taxable income. Users of this new application will have the option to specify a policy reform file containing JSON (if no policy reform file is specified, then the application uses current-law policy parameters from the params.json file) and the option to specify an inflation rate file containing JSON (if no inflation rate file is specified, then the application uses the default inflation rates specified in the taxcalc Parameters class.

If the user supplies a data record with a specified tax year, we would inflate the tax law parameters to that year? But leave the data alone? Do we need 2 years (data year and law year)? It looks like the proposed API takes a data year and a number of years forward. But one doesn't always want to simulate the current year.

The definitions of the 22 input variables and the 29 non-state output variables are available at the NBER Internet TAXSIM website.

The 22 variables are well chosen, except for ignoring educational credits, which are significant and require lots of data to calculate, and filing status, where head of household should be inferred from the existence of dependents, rather than supplied by the user.

Even this brief description highlights a number of requirements that are not supported in the current taxcalc API. First of all, the filing units need to be somehow included in a Records class object without being subject to any weighting or "blowup" (that is, each unit's input variables must be left unchanged). This should be easy

So we are intending to return a file of micro-data, rather than revenue distribution table? If so, how does this differ from taxsim at all? What is the point? I take it that users will be allowed to submit one record of data and get back 10 records of tax liability. Is that right?

I think we should put this off till we have a successful tax calculator in wide use.

dan

MattHJensen commented 9 years ago

Leaving aside technical issues for now and providing some background that is relevant to this comment.

I think we should put this off till we have a successful tax calculator in wide use.

Here are the four primary reasons that I've been thinking such a project would be useful.

  1. It would force a rethink of how flexible our API and architecture are before taxcalc becomes widely used (and therefore harder to modify).
  2. It would make it easier to compare results across TAXSIM, OSPC's taxcalc, and Martin's calculator. Therefore, existing TAXSIM users, for instance, could gain confidence in the tax logic in taxcalc before beginning to use it for policy analysis.
  3. It allows us to provide files of "example households" to those users who do not have access to the puf but want to interact with the calculator code (not just TaxBrain). I think these example household files would be interesting in their own right, but if nothing else, they would serve as a stop gap until we have a chance to pursue the more difficult projects discussed in #319.
  4. It provides an opportunity for a relatively new contributor, Martin, to get started contributing to the code base through a project that is interesting to him and useful for reasons 1-3 above.

Interestingly, @feenberg mentions both (1) and (2) as reasons not to do the project. I agree that a key consideration should be how much time and effort this will involve, and that we should view these features as secondary rather than primary. In other words, trade-offs between TAXSIM-like capabilities and revenue forecasting capabilities should always be decided in favor of revenue-forecasting.

mmessick commented 9 years ago

First of all, the filing units need to be somehow included in a Records class object without being subject to any weighting or "blowup" (that is, each unit's input variables must be left unchanged). This should be easy to do, but is likely to cause a redesign of the taxcalc API.

@martinholmer, this could be more easily implemented with the changes I proposed to the Records class object in pr #309. for the example you gave, you would just initialize a 5-dimensional Records class object with all of the values separately stored for each year. This way, "blowup" would not affect the values. Similarly, you could just set all blowup factors to one, effectively eliminating this worry. What were you thinking of specifically for changes to the Records class object?

martinholmer commented 9 years ago

@mmessick, thanks for pointing out your pull request #309. I'll have to study it to see exactly how the Records class would work if it were merged into the master branch. I'm curious about the timing of that merge; it seems to have been a while since the last discussion of your pull request.

MattHJensen commented 9 years ago

@mmessick, yes, thanks for bringing this up. @martinholmer, we were holding off on merging #309 until we could get your feedback on whether it would help with this project. It was originally inspired by the idea of integrating with Luca, but since this project and the Luca integration have some similar requirements for records.py and we anticipated this project moving sooner, we thought it better to make sure it was helpful for this project before merging.

martinholmer commented 9 years ago

Sorry to be the bottleneck on pull request #309, but I haven't been able to get beyond issues related to the Parameters class that have come up as I began working on the TAXSIM-like capability outlined in issue #291.

MattHJensen commented 9 years ago

No problem, Martin. No depreciation! (well, very little)

martinholmer commented 9 years ago

Dan @feenberg said in part about the proposed TAXSIM-like application:

how does this differ from TAXSIM at all? What is the point?

Well, even the initial version of this application would be different in at least a few ways.

The most important difference would be that this application would allow users to specify a wide range of reforms to current-law tax policy parameters, something that TAXSIM does not allow. And, users would be able to specify their own inflation rates if they want to index tax parameters by something different than the TaxBrain default inflation rates.

And the initial version of this application provides people who want to contribute to the source code something they can do with the downloaded source code other than run the test suite.

And in a subsequent version of this application, there are plans to try to implement the excellent suggestion that Dan @feenberg made in response to issue #319:

There isn't much in the CPS that goes beyond the 22 TAXSIM fields, but other surveys do, so it would be good to allow for additional fields.

If this effort were to be successful, it would add another important difference between this application and the NBER Internet TAXSIM application.

feenberg commented 9 years ago

On Sun, 9 Aug 2015, Martin Holmer wrote:

Dan @feenberg said in part about the proposed TAXSIM-like application:

  how does this differ from TAXSIM at all? What is the point?

Well, even the initial version of this application would be different in at least a few ways.

The most important difference would be that this application would allow users to specify a wide range of reforms to current-law tax policy parameters, something that TAXSIM does not allow. And, users would be able to specify their own inflation rates if they want to index tax parameters by something different than the TaxBrain default inflation rates.

And the initial version of this application provides people who want to contribute to the source code something they can do with the downloaded source code other than run the test suite.

And in a subsequent version of this application, there are plans to try to implement the excellent suggestion that Dan @feenberg made in response to issue #319:

  There isn't much in the CPS that goes beyond the 22 TAXSIM fields,
  but other surveys do, so it would be good to allow for additional
  fields.

If this effort were to be successful, it would add another important difference between this application and the NBER Internet TAXSIM application.

I surrender.

Actually, as long as OSPC doesn't do the past, and taxsim doesn't do the future, there isn't any real overlap. But I do wonder about the usefulness of user-supplied data about the future. It will be a limited set of users that prefer their own data to Ohara's, and those will mostly be insiders with access to the JCT files. There will be a group of users who want a higher or lower growth rate than JCT predicts, and it might be reasonable to allow some user specified variation in real growth.

dan

MattHJensen commented 9 years ago

@martinholmer, is this issue ready to be closed?

martinholmer commented 9 years ago

This proposed redesign of the Parameters class API was implemented in pull request #343, which was committed to the master branch on September 8, 2015.