PSLmodels / taxdata

The TaxData project prepares microdata for use with the Tax-Calculator microsimulation project.
http://pslmodels.github.io/taxdata/
Other
21 stars 30 forks source link

CBO baseline update #412

Closed bodiyang closed 2 years ago

bodiyang commented 2 years ago

This PR updates Tax Data to the May 2022 CBO economic projections.

The update process follows Tax Data CBO Baseline Updating Instructions

However, the algorithm has been out of date and needs to be changed each year based on the format of the projection forms. Future updates for the Tax Data CBO Baseline updates can either be completed following the Updating Instructions with fair amount of code change or completed manually in the CBO_baseline.csv.

bodiyang commented 2 years ago

@andersonfrailey Hi Anderson, the update of CBO Baseline, May 2022, has been completed in this PR. Can you have a check if everything looks ok to merge?

andersonfrailey commented 2 years ago

Thanks for working on this, @bodiyang! Can you see how this would affect our projections and generate a report? You can do so following the instructions here.

Also, you're right about the auto-updating scripts being out of date. I tried running them the other day and I think a few things have changed on CBO's side that broke the code. I'll see if there's a way to make them work again or if it's better to go back to doing things by hand.

bodiyang commented 2 years ago

Thanks for working on this, @bodiyang! Can you see how this would affect our projections and generate a report? You can do so following the instructions here.

Also, you're right about the auto-updating scripts being out of date. I tried running them the other day and I think a few things have changed on CBO's side that broke the code. I'll see if there's a way to make them work again or if it's better to go back to doing things by hand.

Have generated the report of 2022 in the last commit.

andersonfrailey commented 2 years ago

Thanks! One more thing that I forgot about yesterday. Did you run the make all command to recalculate all of the growth rates and weights? We should see some changes there with the new projections. Since a new year is being added to the projections, you'll also need to to update the stage 1, 2, and 3 scripts. Instructions for that are at the bottom of the CBO updating instructions doc.

bodiyang commented 2 years ago

Thanks! One more thing that I forgot about yesterday. Did you run the make all command to recalculate all of the growth rates and weights? We should see some changes there with the new projections. Since a new year is being added to the projections, you'll also need to to update the stage 1, 2, and 3 scripts. Instructions for that are at the bottom of the CBO updating instructions doc.

Trying to make it run right now~ Got an error from running make all command, looks like the puf2011.csv file is missing. Has it been deleted or changes need to be made in the createpuf.py?

"/Users/bodiyang/Desktop/taxdata/taxdata/createpuf.py", line 97, in puf2011 = pd.read_csv(Path(DATA_PATH, "puf2011.csv"))

andersonfrailey commented 2 years ago

Do you have access to the raw PUF? That's the file it's looking for here.

If not, just run the make command for the CPS file, I can merge this, then do the PUF next

bodiyang commented 2 years ago

issue: when running cps_stage2/stage2.py get the error as~

The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/bodiyang/Desktop/taxdata/taxdata/cps_stage2/stage2.py", line 106, in main() File "/Users/bodiyang/Desktop/taxdata/taxdata/cps_stage2/stage2.py", line 65, in main factor_match = _factors[year].equals(CUR_FACTORS[year]) File "/opt/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py", line 3455, in getitem indexer = self.columns.get_loc(key) File "/opt/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc raise KeyError(key) from err KeyError: 2032

---> at line 37 CUR_FACTORS = pd.read_csv( "https://raw.githubusercontent.com/PSLmodels/taxdata/master/puf_stage1/Stage_I_factors.csv", index_col=0,)

The problem of this error might be because the Stage_I_factors.csv is not updated in this link, which does not include the year 2032. Shall we have a try to merge the PR and see if this link will be updated with 2032 values? to pass the stage2.py script

andersonfrailey commented 2 years ago

@bodiyang, I think your diagnosis is right, but let's do a quick fix instead of merging and then fixing just because I don't want the weights in the repo to be out of step with the projections and that would cause some issues with creating weights. If you update lines 65 and 66 to read as the following:

try:
    factor_match = _factors[year].equals(CUR_FACTORS[year])
except KeyError:
    factor_match = False
try:
    target_match = stage_2_targets[f"{year}"].equals(CUR_TARGETS[f"{year}"])
except KeyError:
    target_match = False

Similarly, lines 62 and 63 in puf_stage2/stage2.py should be changed to

try:
    factor_match = Stage_I_factors[i].equals(CUR_FACTORS[i])
except KeyError:
    factor_match = False
try:
    target_match = Stage_II_targets[f"{i}"].equals(CUR_TARGETS[f"{i}"])
except KeyError:
    target_match = False

factor_match and target_match are just used to see if we can skip creating weights for a given year, by setting both to false if a year doesn't appear in our current projections we'd be telling the program that weights do need to be created for that year.

This is a good catch too because it's a problem that'll pop up for every CBO update. I probably should've thought of it when I first wrote those lines so sorry about that! But if you add those lines in it shouldn't be an issue any more.

bodiyang commented 2 years ago

@bodiyang, I think your diagnosis is right, but let's do a quick fix instead of merging and then fixing just because I don't want the weights in the repo to be out of step with the projections and that would cause some issues with creating weights. If you update lines 65 and 66 to read as the following:

try:
    factor_match = _factors[year].equals(CUR_FACTORS[year])
except KeyError:
    factor_match = False
try:
    target_match = stage_2_targets[f"{year}"].equals(CUR_TARGETS[f"{year}"])
except KeyError:
    target_match = False

Similarly, lines 62 and 63 in puf_stage2/stage2.py should be changed to

try:
    factor_match = Stage_I_factors[i].equals(CUR_FACTORS[i])
except KeyError:
    factor_match = False
try:
    target_match = Stage_II_targets[f"{i}"].equals(CUR_TARGETS[f"{i}"])
except KeyError:
    target_match = False

factor_match and target_match are just used to see if we can skip creating weights for a given year, by setting both to false if a year doesn't appear in our current projections we'd be telling the program that weights do need to be created for that year.

This is a good catch too because it's a problem that'll pop up for every CBO update. I probably should've thought of it when I first wrote those lines so sorry about that! But if you add those lines in it shouldn't be an issue any more.

Thanks Anderson, solved this in another PR

bodiyang commented 2 years ago

@andersonfrailey Have solved the bugs and everything looks good to me right now. Able to create CPS files by running make all as way of testing. Do you think we are ready to merge now?

andersonfrailey commented 2 years ago

Except for the test failures this is looking pretty good. But I don't really understand why the CPS projections don't change at all. I wouldn't expect much of a difference, but for there to be no difference feels fishy. I don't know if we can't get to a definitive answer as to why that's happening, but do you have any guesses?

bodiyang commented 2 years ago

Thanks Anderson. Have fixed the testing problems in the last commit;

Can you expand more on

But I don't really understand why the CPS projections don't change at all.

Where/which point makes you think the CPS projections don't change? (I'm not sure is there a CPS projection file would be automated in taxdata and should reflect some changes there?)

andersonfrailey commented 2 years ago

@bodiyang, there's a table in the PDF report that shows the year-by-year projections for the CPS and those are the same for both the old and new file. I also ran them through taxcalc myself just to verify and got the same results.

It seems pretty unlikely to me that the exact same weights would be generated for each year after the CBO updates, but if after checking we can't find something that's wrong I guess we'll just have to accept it

bodiyang commented 2 years ago

Got it, I will have another check on this to see if can figure out if anything goes wrong.

bodiyang commented 2 years ago

follow up note: Comparison between base CPS value and new CPS value is constructed by report.py line 184 ~ line 204, based upon cps.csv.gz and cps_weights.csv.gz

For cps.csv.gz, nothing is updated; For cps_weights.csv.gz, the value of 2032 is added, the value of all previous years is not changed.

This is the reason why CPS projections don't change at all.

So our issue can be narrowed down to check why cps.csv.gz and cps_weights.csv.gz remained unchanged or should they expected to be unchanged.

I have checked the previous PR, these two files had been changed/updated. However in this update, these two files remained the same (year before 2031).

cc @andersonfrailey @MattHJensen @jdebacker

bodiyang commented 2 years ago

follow up note: Comparison between base CPS value and new CPS value is constructed by report.py line 184 ~ line 204, based upon cps.csv.gz and cps_weights.csv.gz

For cps.csv.gz, nothing is updated; For cps_weights.csv.gz, the value of 2032 is added, the value of all previous years is not changed.

This is the reason why CPS projections don't change at all.

So our issue can be narrowed down to check why cps.csv.gz and cps_weights.csv.gz remained unchanged or should they expected to be unchanged.

I have checked the previous PR, these two files had been changed/updated. However in this update, these two files remained the same (year before 2031).

cc @andersonfrailey @MattHJensen @jdebacker

@andersonfrailey I made a mistake in this previous conversation. So reexplain the issue here: [cps.csv.gz] (https://github.com/PSLmodels/taxdata/blob/master/data/cps.csv.gz) and cps_weights.csv.gz have been updated. report.py line 184 ~ line 204 will compare the old CPS and the new CPS based on the old cps.csv.gz, old cps_weights.csv.gz with the new cps.csv.gz, new cps_weights.csv.gz.

So to speak, this problem is basically base files cps.csv.gz and cps_weights.csv.gz have been changed/updated, while the resulting CPS projections remain unchanged.

Have discussed with @jdebacker @MattHJensen in the PSL meeting, and consider this is probably the problem of how taxdata generates report.

I will conduct more investigation into this issue in report.py

bodiyang commented 2 years ago

@andersonfrailey @jdebacker @MattHJensen

Have run the code by hand in report.py. The issue why CBO projections showing no difference in the report is related to the decimal places. There are actually calculated differences, but too small to show up.

For example the Current Payroll versus the New Payroll of the year 2023 in baseline CPS and new CPS are 1375.1514275498105 and 1375.1514277266267; In the report, both of them appear to be 1375.2

Full detailed results can be referred to values.docx 99)

So to speak, CBO projections indeed have changed because of the CBO update, while the differences are very small from this year's CBO update.

We can then merge this PR, if think it's all right.

bodiyang commented 2 years ago

Generated a new report with PUF

bodiyang commented 2 years ago

Details of which files are used to construct the record class used in the report's tax liability analysis

CPS comparison: Records(data = cps.csv. gz, weights = cps_weights.csv.gz, adjust_ratio=None, start_year=2014, gfactor=Growfactors()) cps.csv.gz: old/base will call the file from tax-calculator, new will call the file from new one in tax-data; both old and new are the same cps_weights.csv.gz: old/base will call the file from tax-calculator, new will call the file from new one in tax-data; old and new files are different in all of the years gfactor: both old and new call the fill from tax-calculator, they are same.

PUF comparison: Records(data= puf.csv, weights = puf_weights.csv, adjust_ratios = puf_ratios) puf.csv: both old and new call the one in tax-data; they are the same puf_weights.csv: old/base will call the file from tax-calculator, new will call the file from new one in tax-data; the new one just add the year 2032, previous years' values are the same puf_ratios: old/base will call the file from tax-calculator, new will call the file from new one in tax-data; old and new files are different in all of the years

andersonfrailey commented 2 years ago

For posterity:

We had a discussion today regarding the reports being generated by this PR. We believe that there is an issue with the reports script that causes the tables showing the difference in tax liabilities to be incorrect. We're going to generate those tables without using the reports function and post them here before merging.

bodiyang commented 2 years ago

CPS Tax Liability values generated without using the reports function: (in billions) Tax Liability Tax Year 0 1375.151428 Current Payroll 2023 1 1375.151428 New Payroll 2023 2 1522.562076 Current Income 2023 3 1522.562078 New Income 2023 4 2897.713504 Current Combined 2023 5 2897.713506 New Combined 2023 6 1436.817409 Current Payroll 2024 7 1436.817408 New Payroll 2024 8 1605.965920 Current Income 2024 9 1605.965924 New Income 2024 10 3042.783330 Current Combined 2024 11 3042.783332 New Combined 2024 12 1501.066231 Current Payroll 2025 13 1501.066231 New Payroll 2025 14 1695.325091 Current Income 2025 15 1695.325091 New Income 2025 16 3196.391321 Current Combined 2025 17 3196.391322 New Combined 2025 18 1564.427289 Current Payroll 2026 19 1564.427289 New Payroll 2026 20 2011.714781 Current Income 2026 21 2011.714781 New Income 2026 22 3576.142070 Current Combined 2026 23 3576.142070 New Combined 2026 24 1624.190463 Current Payroll 2027 25 1624.190464 New Payroll 2027 26 2100.235837 Current Income 2027 27 2100.235837 New Income 2027 28 3724.426300 Current Combined 2027 29 3724.426301 New Combined 2027 30 1684.738718 Current Payroll 2028 31 1684.738718 New Payroll 2028 32 2186.922203 Current Income 2028 33 2186.922203 New Income 2028 34 3871.660922 Current Combined 2028 35 3871.660922 New Combined 2028 36 1746.246235 Current Payroll 2029 37 1746.246236 New Payroll 2029 38 2279.761544 Current Income 2029 39 2279.761535 New Income 2029 40 4026.007778 Current Combined 2029 41 4026.007771 New Combined 2029 42 1808.787103 Current Payroll 2030 43 1808.787102 New Payroll 2030 44 2374.260655 Current Income 2030 45 2374.260657 New Income 2030 46 4183.047758 Current Combined 2030 47 4183.047759 New Combined 2030 48 1875.956833 Current Payroll 2031 49 1875.956832 New Payroll 2031 50 2474.772797 Current Income 2031 51 2474.772800 New Income 2031 52 4350.729630 Current Combined 2031 53 4350.729632 New Combined 2031

bodiyang commented 2 years ago

Current PUF: Tax Liability Tax Year 0 1396.225761 Current Payroll 2023 1 4893.428384 Current Income 2023 2 6289.654144 Current Combined 2023 3 1458.204694 Current Payroll 2024 4 5085.776681 Current Income 2024 5 6543.981375 Current Combined 2024 6 1523.474740 Current Payroll 2025 7 5308.387928 Current Income 2025 8 6831.862667 Current Combined 2025 9 1588.177347 Current Payroll 2026 10 5814.751007 Current Income 2026 11 7402.928354 Current Combined 2026 12 1650.251097 Current Payroll 2027 13 5902.757669 Current Income 2027 14 7553.008766 Current Combined 2027 15 1712.969742 Current Payroll 2028 16 6115.146495 Current Income 2028 17 7828.116237 Current Combined 2028 18 1776.676028 Current Payroll 2029 19 6309.859750 Current Income 2029 20 8086.535778 Current Combined 2029 21 1841.491841 Current Payroll 2030 22 6525.260429 Current Income 2030 23 8366.752270 Current Combined 2030 24 1911.803796 Current Payroll 2031 25 6756.977524 Current Income 2031 26 8668.781320 Current Combined 2031

bodiyang commented 2 years ago

@andersonfrailey Have generated the CPS and PUF tax liability without calling report.py, as documented in the previous conversation. I think the PR is good to merge right now, see if there is any other question.

andersonfrailey commented 2 years ago

I still think it's weird that the CPS projections aren't changing more. However, I can't see anything wrong in this PR that would cause that to happen so I'm going to merge it so that we can get to the newer updates and I can work on #411 again.