Closed MaxGhenis closed 3 years ago
Could the reform JSON files just be included in the conda package? This seems like it would solve the problem in many use cases and avoid version conflicts, This is something I had pushed for sometime ago, but there didn't seem to be much support for the idea, hence using the URL in OG-USA.
Another alternative is to go back to the more frequent release schedule that TC was on (though I understand wanting to wrap several things into a major version bump). The ParamTools upgrade was a major refactoring and has been in the master branch for several weeks, leaving an unusually long time with significant differences between the master branch and the conda package.
Could the reform JSON files just be included in the conda package?
Yes, this is what I was hoping could be the solution.
Including the JSON file in the conda package makes sense to me.
@jdebacker, do you remember what the concerns against doing so were when you advocated for it in the past?
@MattHJensen Yes, @martinholmer expressed concerns in Issue #1644, though I believe we've now added additional benefit to this change (conflicts between versions) - and I believe many fo the costs highlighted in that conversation are not very high or relevant given recent developments to Tax-Calculator.
I've linked issue #2455 here to note that it would be nice if the resolution of this issue also makes it easier to pull the reforms used in the recipes. (To accomplish that, it may make things easier to move the recipe reforms to the same location as the other reforms.)
@jdebacker, thanks for linking to #1644. I agree with you that the cost benefit is still on the side of adding the capabilities discussed in this issue.
I am curious what others think about moving all reform jsons to a separate repo of community-sourced reforms, which would also be packaged separately.
Beyond requiring some tests to pass, there would not be a centralized review process in this repo -- i.e., authors of reforms would take full responsibility for their reforms. This would mean that we could automate deployments for the new package and the deployment of reforms to users would not be held up by Tax-Calculator's release schedule, resolving the primary concern raised in #1644.
Separating reform jsons from the Tax-Calculator repo would also reduce a significant reputational risk to the project (also mentioned here: https://github.com/PSLmodels/Tax-Calculator/issues/2070#issuecomment-427105887).
Would this package be a dependency of Tax-Calc?
Two of the main issues raised are:
A separate repo/package does not help with this.
Although I do like the idea of a central repository where different users can upload reforms they want to share easily.
Would this package be a dependency of Tax-Calc?
Yes, that's my thinking. (sorry, should have mentioned that above). As a dependency, the two issues you've highlighted would be resolved, and also you could get the latest updates from the new reforms package without needing a new release of taxcalc, so the primary concern from #1644 would be resolved as well.
We would still need the new Tax-Calculator function to grab the reform files, (the subject of this issue), we'd just be grabbing them from the dependency instead of from Tax-Calculator.
I think this is a great idea. It would be a big boon for people getting started with tax-calculator or who want a library of reforms to use for evaluating data sets, or programs, or for other purposes. The only thing I would add is that it would be nice for people to have some way of evaluating whether a particular JSON file is likely to be trustworthy. There is a range of ways to do that, some of which might involve work for PSL, and some of which might not. I suspect there is a continuum that might range from something like the following (most work to least work):
MOST
On Fri, Aug 14, 2020 at 1:54 PM Matt Jensen notifications@github.com wrote:
Would this package be a dependency of Tax-Calc?
Yes, that's my thinking. (sorry, should have mentioned that above). As a dependency, the two issues you've highlighted would be resolved, and also you could get the latest updates from the new reform package without needing a new release of taxcalc, so the primary concern from #1644 https://github.com/PSLmodels/Tax-Calculator/issues/1644 would be resolved as well.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PSLmodels/Tax-Calculator/issues/2445#issuecomment-674192240, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABR4JGFROU7KRZFFWHEHRULSAV25LANCNFSM4PSWPNZA .
The only thing I would add is that it would be nice for people to have some way of evaluating whether a particular JSON file is likely to be trustworthy. There is a range of ways to do that, some of which might involve work for PSL, and some of which might not. I suspect there is a continuum that might range from something like the following (most work to least work):
MOST
- approval process ...
- must be shown to run without error ...
- must meet certain documentation requirements ...
- must give the name of the person or organization from which the file comes from, let the user decide if person or organization is trustworthy ...
- anybody can submit anything LEAST
(I added numbers to Don's list)
My inclination is prioritize reform authorship (4) as a source of trust but to also have automated tests to ensure that everything runs with Tax-Calculator (2) and automated tests to ensure a template/style is followed for documentation (3). This would mean that the release/deploy process could be automated using GH Actions and a new release could be issued with each PR merge. No one would need to review or issue manual releases.
The project would still need a maintainer to set up and maintain the test suite, GH actions for release automation, and project (but not reform) documentation.
Just to throw out another idea: What if taxcalc had an interface to Compute Studio, and you could load the reform by simulation number (from the URL)? This would require us to make Compute Studio simulations for each of the reforms in the reforms
folder, and for the ones used in recipes, but that'd be useful regardless IMO. It'd avoid requiring GitHub actions to check the validity of json files, filename conflicts, and creating a new package, and invests more in the PSL's de facto social network. @hdoupe
@MaxGhenis I like the idea of making it easier to grab reforms on C/S and running them locally, but the compute-studio-kit
repo may be a better home for that interface:
import cs_kit
import taxcalc
# https://github.com/compute-tooling/compute-studio-kit/blob/master/cs_kit/api.py#L12
cs = cs_kit.ComputeStudio("PSLmodels", "Tax-Brain")
inputs = cs.inputs(model_pk=1234)
pol = taxcalc.Policy()
pol.adjust(inputs["adjustment"])
@hdoupe does that code work, or is it how you'd suggest implementing it? That seems like a great solution to me.
@MaxGhenis It doesn't work now, but it wouldn't be hard to get it there! Just need to add the inputs
method on the ComputeStudio
class, and it should work.
Could we discuss how to implement this in Tuesday's PSL meeting?
@MaxGhenis I've started working on this, but still need to get a better more done before I'm ready to share. I'm taking the afternoon off and won't be on the PSL call. But I will have something to demo tomorrow. Sorry for the delay. Been jammed up with some other projects, but I am excited about making it easier to use C/S to store/retrieve reform files.
I put together an initial implementation using fsspec so that reforms can easily be read from the C/S API into Tax-Calculator. Right now, it's set up like this:
import taxcalc
pol = taxcalc.Policy()
pol.adjust("cs://PSLmodels:Tax-Brain@47410/inputs/adjustment/policy")
# OrderedDict([('II_rt1', [{'year': 2019, 'value': 0.25}]),
# ('II_rt2', [{'year': 2019, 'value': 0.25}]),
# ('II_rt3', [{'year': 2019, 'value': 0.25}]),
# ('II_rt4', [{'year': 2019, 'value': 0.25}]),
# ('II_rt5', [{'year': 2019, 'value': 0.25}]),
# ('II_rt6', [{'year': 2019, 'value': 0.25}]),
# ('II_brk6',
# [{'year': 2019, 'value': 2000000.0, 'MARS': 'widow'},
# {'year': 2019, 'value': 2000000.0, 'MARS': 'headhh'},
# {'year': 2019, 'value': 2000000.0, 'MARS': 'mseparate'},
# {'year': 2019, 'value': 2000000.0, 'MARS': 'mjoint'},
# {'year': 2019, 'value': 2000000.0, 'MARS': 'single'}]),
# ('II_rt7', [{'year': 2019, 'value': 0.7}])])
But, you can do more things like read results from the simulation:
import io
import pandas as pd
import paramtools
from cs_kit.filespec import CSFileSystem
data = paramtools.read_json("cs://PSLmodels:Tax-Brain@47410/outputs")
results = []
for output in data:
results.append(
pd.read_csv(io.StringIO(output["data"]))
)
data[0]["title"]
# 'Total Liabilities Change by Calendar Year (Billions).csv'
results[0]
# Unnamed: 0 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029
# 0 Individual Income Tax Liability Change $651.31 $677.74 $707.50 $735.71 $764.94 $793.97 $824.71 $609.33 $631.98 $654.94 $678.08
# 1 Payroll Tax Liability Change $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00 $0.00
# 2 Combined Payroll and Individual Income Tax Lia... $651.31 $677.74 $707.50 $735.71 $764.94 $793.97 $824.71 $609.33 $631.98 $654.94 $678.08
Or get the owner of the simulation:
paramtools.read_json("cs://PSLmodels:Tax-Brain@47410/owner")
# OrderedDict([('owner', 'MaxGhenis')])
The PR implementing this is here: https://github.com/compute-tooling/compute-studio-kit/pull/16
Hi all, cs-kit
has the capabilities demo-ed in my comment above in version 1.12.0 now. I'm looking forward to hearing what folks think:
pip install -U cs-kit
Awesome @hdoupe, this worked for reproducing recipe00
in this colab notebook, pulling from this CS version of reformA.json
. It produced some of the same results as what's currently in the docs (other stuff broke, unrelated to this I think, see #2493).
Should we revise the recipes to use this?
@MaxGhenis That sounds good to me! The -indexed
flags are handled a little differently by C/S, but you can use cs2tc
to convert the parameters to the right format.
@MaxGhenis That sounds good to me! The
-indexed
flags are handled a little differently by C/S, but you can usecs2tc
to convert the parameters to the right format.
Could you give an example?
@MaxGhenis it wouldn't be quite as clean, but for reforms that change the indexed status of a variable it would look like this:
Updating the read_params
method like I did in the second part of the notebook keeps the transition smooth, but would introduce an optional dependency.
Here's the code for the modified Policy
class:
import taxcalc
class Policy(taxcalc.Policy):
def read_params(self, params_or_path):
try:
import cs2tc
except ImportErorr:
cs2tc = None
params = super().read_params(params_or_path)
if (
cs2tc is not None
and isinstance(params_or_path, str)
and params_or_path.startswith("cs://")
):
return cs2tc.convert_policy_adjustment(params)
else:
return params
@MaxGhenis How are you on this issue?
Note that we're moving all the reform json files over to PSLmodels/examples.
Closing issue due to the move of reform files over to PSLmodels/examples.
Currently, there are two ways to access
json
files in taxcalc/reforms:(2) seems to be more common, since it's platform-agnostic and also works if
taxcalc
is only installed through aconda
environment.This has caused two issues though:
jupyter-book
) cause failures building the documentation. This locks us into loading reforms locally in thejupyter-book
docs, but keeps us from enabling Colab launching (#2429).master
can create issues whenmaster
doesn't align with the installedtaxcalc
version, as is causing an error in OG-USA's example (https://github.com/PSLmodels/OG-USA/issues/607).There may be workarounds for each of these, but adding a function to load reforms from the currently installed
taxcalc
package would address them more cleanly IMO. @hdoupe's comments in https://github.com/PSLmodels/OG-USA/issues/607 also suggest it could save a step or two by applying reforms directly viaparamtools
.Something like this interface could work, but maybe an enum or something in
paramtools
would be more pythonic.