PSLmodels / Tax-Calculator

USA Federal Individual Income and Payroll Tax Microsimulation Model
https://taxcalc.pslmodels.org
Other
263 stars 156 forks source link

Function to get reform json from installed version of taxcalc #2445

Closed MaxGhenis closed 3 years ago

MaxGhenis commented 4 years ago

Currently, there are two ways to access json files in taxcalc/reforms:

  1. Navigate to the directory of a local installation
  2. Download it from the web

(2) seems to be more common, since it's platform-agnostic and also works if taxcalc is only installed through a conda environment.

This has caused two issues though:

There may be workarounds for each of these, but adding a function to load reforms from the currently installed taxcalc package would address them more cleanly IMO. @hdoupe's comments in https://github.com/PSLmodels/OG-USA/issues/607 also suggest it could save a step or two by applying reforms directly via paramtools.

Something like this interface could work, but maybe an enum or something in paramtools would be more pythonic.

import taxcalc as tc
reform = tc.taxcalc_reform('2017_law')  # Loads 2017_law.json from the currently-installed taxcalc.
tc.Calculator.read_json_param_objects(reform, None)
jdebacker commented 4 years ago

Could the reform JSON files just be included in the conda package? This seems like it would solve the problem in many use cases and avoid version conflicts, This is something I had pushed for sometime ago, but there didn't seem to be much support for the idea, hence using the URL in OG-USA.

Another alternative is to go back to the more frequent release schedule that TC was on (though I understand wanting to wrap several things into a major version bump). The ParamTools upgrade was a major refactoring and has been in the master branch for several weeks, leaving an unusually long time with significant differences between the master branch and the conda package.

MaxGhenis commented 4 years ago

Could the reform JSON files just be included in the conda package?

Yes, this is what I was hoping could be the solution.

MattHJensen commented 4 years ago

Including the JSON file in the conda package makes sense to me.

@jdebacker, do you remember what the concerns against doing so were when you advocated for it in the past?

jdebacker commented 4 years ago

@MattHJensen Yes, @martinholmer expressed concerns in Issue #1644, though I believe we've now added additional benefit to this change (conflicts between versions) - and I believe many fo the costs highlighted in that conversation are not very high or relevant given recent developments to Tax-Calculator.

MattHJensen commented 4 years ago

I've linked issue #2455 here to note that it would be nice if the resolution of this issue also makes it easier to pull the reforms used in the recipes. (To accomplish that, it may make things easier to move the recipe reforms to the same location as the other reforms.)

MattHJensen commented 4 years ago

@jdebacker, thanks for linking to #1644. I agree with you that the cost benefit is still on the side of adding the capabilities discussed in this issue.

MattHJensen commented 4 years ago

I am curious what others think about moving all reform jsons to a separate repo of community-sourced reforms, which would also be packaged separately.

Beyond requiring some tests to pass, there would not be a centralized review process in this repo -- i.e., authors of reforms would take full responsibility for their reforms. This would mean that we could automate deployments for the new package and the deployment of reforms to users would not be held up by Tax-Calculator's release schedule, resolving the primary concern raised in #1644.

Separating reform jsons from the Tax-Calculator repo would also reduce a significant reputational risk to the project (also mentioned here: https://github.com/PSLmodels/Tax-Calculator/issues/2070#issuecomment-427105887).

jdebacker commented 4 years ago

Would this package be a dependency of Tax-Calc?

Two of the main issues raised are:

  1. JSON files not available locally with install of Tax-Calc
  2. Incompatibility of JSON files with versions of Tax-Calc

A separate repo/package does not help with this.

Although I do like the idea of a central repository where different users can upload reforms they want to share easily.

MattHJensen commented 4 years ago

Would this package be a dependency of Tax-Calc?

Yes, that's my thinking. (sorry, should have mentioned that above). As a dependency, the two issues you've highlighted would be resolved, and also you could get the latest updates from the new reforms package without needing a new release of taxcalc, so the primary concern from #1644 would be resolved as well.

We would still need the new Tax-Calculator function to grab the reform files, (the subject of this issue), we'd just be grabbing them from the dependency instead of from Tax-Calculator.

donboyd5 commented 4 years ago

I think this is a great idea. It would be a big boon for people getting started with tax-calculator or who want a library of reforms to use for evaluating data sets, or programs, or for other purposes. The only thing I would add is that it would be nice for people to have some way of evaluating whether a particular JSON file is likely to be trustworthy. There is a range of ways to do that, some of which might involve work for PSL, and some of which might not. I suspect there is a continuum that might range from something like the following (most work to least work):

MOST

On Fri, Aug 14, 2020 at 1:54 PM Matt Jensen notifications@github.com wrote:

Would this package be a dependency of Tax-Calc?

Yes, that's my thinking. (sorry, should have mentioned that above). As a dependency, the two issues you've highlighted would be resolved, and also you could get the latest updates from the new reform package without needing a new release of taxcalc, so the primary concern from #1644 https://github.com/PSLmodels/Tax-Calculator/issues/1644 would be resolved as well.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PSLmodels/Tax-Calculator/issues/2445#issuecomment-674192240, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABR4JGFROU7KRZFFWHEHRULSAV25LANCNFSM4PSWPNZA .

MattHJensen commented 4 years ago

The only thing I would add is that it would be nice for people to have some way of evaluating whether a particular JSON file is likely to be trustworthy. There is a range of ways to do that, some of which might involve work for PSL, and some of which might not. I suspect there is a continuum that might range from something like the following (most work to least work):

MOST

  1. approval process ...
  2. must be shown to run without error ...
  3. must meet certain documentation requirements ...
  4. must give the name of the person or organization from which the file comes from, let the user decide if person or organization is trustworthy ...
  5. anybody can submit anything LEAST

(I added numbers to Don's list)

My inclination is prioritize reform authorship (4) as a source of trust but to also have automated tests to ensure that everything runs with Tax-Calculator (2) and automated tests to ensure a template/style is followed for documentation (3). This would mean that the release/deploy process could be automated using GH Actions and a new release could be issued with each PR merge. No one would need to review or issue manual releases.

The project would still need a maintainer to set up and maintain the test suite, GH actions for release automation, and project (but not reform) documentation.

MaxGhenis commented 4 years ago

Just to throw out another idea: What if taxcalc had an interface to Compute Studio, and you could load the reform by simulation number (from the URL)? This would require us to make Compute Studio simulations for each of the reforms in the reforms folder, and for the ones used in recipes, but that'd be useful regardless IMO. It'd avoid requiring GitHub actions to check the validity of json files, filename conflicts, and creating a new package, and invests more in the PSL's de facto social network. @hdoupe

hdoupe commented 4 years ago

@MaxGhenis I like the idea of making it easier to grab reforms on C/S and running them locally, but the compute-studio-kit repo may be a better home for that interface:

import cs_kit
import taxcalc

# https://github.com/compute-tooling/compute-studio-kit/blob/master/cs_kit/api.py#L12
cs = cs_kit.ComputeStudio("PSLmodels", "Tax-Brain")

inputs = cs.inputs(model_pk=1234)

pol = taxcalc.Policy()
pol.adjust(inputs["adjustment"])
MaxGhenis commented 4 years ago

@hdoupe does that code work, or is it how you'd suggest implementing it? That seems like a great solution to me.

hdoupe commented 4 years ago

@MaxGhenis It doesn't work now, but it wouldn't be hard to get it there! Just need to add the inputs method on the ComputeStudio class, and it should work.

MaxGhenis commented 4 years ago

Could we discuss how to implement this in Tuesday's PSL meeting?

hdoupe commented 4 years ago

@MaxGhenis I've started working on this, but still need to get a better more done before I'm ready to share. I'm taking the afternoon off and won't be on the PSL call. But I will have something to demo tomorrow. Sorry for the delay. Been jammed up with some other projects, but I am excited about making it easier to use C/S to store/retrieve reform files.

hdoupe commented 4 years ago

I put together an initial implementation using fsspec so that reforms can easily be read from the C/S API into Tax-Calculator. Right now, it's set up like this:

import taxcalc

pol = taxcalc.Policy()
pol.adjust("cs://PSLmodels:Tax-Brain@47410/inputs/adjustment/policy")
# OrderedDict([('II_rt1', [{'year': 2019, 'value': 0.25}]),
#              ('II_rt2', [{'year': 2019, 'value': 0.25}]),
#              ('II_rt3', [{'year': 2019, 'value': 0.25}]),
#              ('II_rt4', [{'year': 2019, 'value': 0.25}]),
#              ('II_rt5', [{'year': 2019, 'value': 0.25}]),
#              ('II_rt6', [{'year': 2019, 'value': 0.25}]),
#              ('II_brk6',
#               [{'year': 2019, 'value': 2000000.0, 'MARS': 'widow'},
#                {'year': 2019, 'value': 2000000.0, 'MARS': 'headhh'},
#                {'year': 2019, 'value': 2000000.0, 'MARS': 'mseparate'},
#                {'year': 2019, 'value': 2000000.0, 'MARS': 'mjoint'},
#                {'year': 2019, 'value': 2000000.0, 'MARS': 'single'}]),
#              ('II_rt7', [{'year': 2019, 'value': 0.7}])])

But, you can do more things like read results from the simulation:

import io

import pandas as pd
import paramtools
from cs_kit.filespec import CSFileSystem

data = paramtools.read_json("cs://PSLmodels:Tax-Brain@47410/outputs")

results = []
for output in data:
    results.append(
        pd.read_csv(io.StringIO(output["data"]))
    )

data[0]["title"]
# 'Total Liabilities Change by Calendar Year (Billions).csv'

results[0]
#                                           Unnamed: 0     2019     2020     2021     2022     2023     2024     2025     2026     2027     2028     2029
# 0             Individual Income Tax Liability Change  $651.31  $677.74  $707.50  $735.71  $764.94  $793.97  $824.71  $609.33  $631.98  $654.94  $678.08
# 1                       Payroll Tax Liability Change    $0.00    $0.00    $0.00    $0.00    $0.00    $0.00    $0.00    $0.00    $0.00    $0.00    $0.00
# 2  Combined Payroll and Individual Income Tax Lia...  $651.31  $677.74  $707.50  $735.71  $764.94  $793.97  $824.71  $609.33  $631.98  $654.94  $678.08

Or get the owner of the simulation:

paramtools.read_json("cs://PSLmodels:Tax-Brain@47410/owner")
# OrderedDict([('owner', 'MaxGhenis')])

The PR implementing this is here: https://github.com/compute-tooling/compute-studio-kit/pull/16

And I also set up a mybinder link: Binder

hdoupe commented 4 years ago

Hi all, cs-kit has the capabilities demo-ed in my comment above in version 1.12.0 now. I'm looking forward to hearing what folks think:

pip install -U cs-kit
MaxGhenis commented 4 years ago

Awesome @hdoupe, this worked for reproducing recipe00 in this colab notebook, pulling from this CS version of reformA.json. It produced some of the same results as what's currently in the docs (other stuff broke, unrelated to this I think, see #2493).

Should we revise the recipes to use this?

hdoupe commented 4 years ago

@MaxGhenis That sounds good to me! The -indexed flags are handled a little differently by C/S, but you can use cs2tc to convert the parameters to the right format.

MaxGhenis commented 4 years ago

@MaxGhenis That sounds good to me! The -indexed flags are handled a little differently by C/S, but you can use cs2tc to convert the parameters to the right format.

Could you give an example?

hdoupe commented 4 years ago

@MaxGhenis it wouldn't be quite as clean, but for reforms that change the indexed status of a variable it would look like this:

Screenshot from 2020-09-25 15-40-18

Updating the read_params method like I did in the second part of the notebook keeps the transition smooth, but would introduce an optional dependency.

Here's the code for the modified Policy class:

import taxcalc

class Policy(taxcalc.Policy):
    def read_params(self, params_or_path):
        try:
            import cs2tc
        except ImportErorr:
            cs2tc = None
        params = super().read_params(params_or_path)
        if (
            cs2tc is not None 
            and isinstance(params_or_path, str) 
            and params_or_path.startswith("cs://")
        ):
            return cs2tc.convert_policy_adjustment(params)
        else:
            return params
jdebacker commented 3 years ago

@MaxGhenis How are you on this issue?

Note that we're moving all the reform json files over to PSLmodels/examples.

jdebacker commented 3 years ago

Closing issue due to the move of reform files over to PSLmodels/examples.