PSLmodels / Tax-Calculator

USA Federal Individual Income and Payroll Tax Microsimulation Model
https://taxcalc.pslmodels.org
Other
258 stars 157 forks source link

p23250 & e23250 #452

Closed Amy-Xu closed 8 years ago

Amy-Xu commented 8 years ago

I thought the two variables are equivalent in the current master, but from PR #451 it seems they are not exactly interchangeable in every single cases due to SimpleTaxIO. Merging the PR and posting this issue to take one more look.

cc @talumbau @martinholmer

martinholmer commented 8 years ago

Here are the SImpleTaxIO results that generated this issue. These results were generated on a local branch where the mtr() code had been revised to accept either 'e23250' or 'p23250' as the income_type_str. This branch also changed SimpleTaxIO to compute marginal tax rates with respect to long-term capital gains (instead of taxpayer earnings). The two aliases for long-term capital gains do not produce the same results as can be seen below.

==> when income_type_str='e23250':
validation> bash test c13 . save
M   taxcalc/validation/c13.taxdiffs
validation> awk '{n[$7]++}END{for(i in n)print i,n[i]}' c13.in.out-simtax | awk '$2>3000' | sort
0.00 36809
12.75 4405
15.00 19411
25.00 5471
25.80 4420
27.75 4053
validation>

==> when income_type_str='p23250':
validation> bash test c13 . save
M   taxcalc/validation/c13.taxdiffs
validation> awk '{n[$7]++}END{for(i in n)print i,n[i]}'
c13.in.out-simtax | awk '$2>3000' | sort
0.00 100000
validation>
Amy-Xu commented 8 years ago

It makes more sense to me now. It seems there's another function _read_input mainly focusing on validation of data type and formats in `simpletaxio,py', without the alias setup. But I'm still a bit confused on why it is p23250 that is all zeros but not e23250, because both 09 PUF and cps-puf should only have p23250 instead of e23250. Anyways in my opinion this issue comes in two parts.

martinholmer commented 8 years ago

@Amy-Xu said:

Anyways in my opinion this issue comes in two parts.

Do we need to make the testing and a regular calculator consistent in handling the variable names? In a regular calculator, because most variables have a leading e, we do not want to add any confusion for users on this kind of detailed naming stuff -- they've already had a lot to digest thinking about the 5-digit variable indexes. If any of the users is expert enough, they should be able to find info from IRS 09 PUF booklet. I'm aware that imputations and extrapolation is not quite necessary for this testing, but how about variable name consistency?

No matter whether the names should be consistent or not, I feel we probably should add a few lines of comments in either records.py where aliases are set up, or anywhere else more appropriate. So users can know what they will encounter in either a regular calculator or simpletaxio testing.

Thanks for your thoughts on this issue. There are many ways to solve this problem, but I would like to solve it in the context of making progress on Issue #425. I have some ideas about how to do that and will be able to post them here in the next day or two. My ideas share your thoughts on the desirability of improving IRS-SOI PUF variable name consistency in the Tax-Calculator.

martinholmer commented 8 years ago

This issue #452 has been resolved by the merge of pull request #508.