PSLmodels / Tax-Calculator

USA Federal Individual Income and Payroll Tax Microsimulation Model
https://taxcalc.pslmodels.org
Other
254 stars 154 forks source link

Variable names #24

Closed MattHJensen closed 8 years ago

MattHJensen commented 10 years ago

I'm hoping to generate a conversation about whether we should rename our variables to be descriptive in English.

We could easily provide a dictionary for variable mapping, set by default to the PUF, and in most cases we could even use the PUF variable descriptions; for instance, earnedIncForEITC instead of E59560.

Note that many economist studying an isolated tax issue don't know tax law well enough to guess what each variable is by comparing our Python code to their internal understanding of tax law; certainly most policy analysts won't. Descriptive variable names would help both of these groups understand and contribute to our code.

Moreover, if our code is readable to the uninitiated, it could be the best place for the uninitiated to learn how tax law works, a valuable contribution in and of itself.

I'd be interested in hearing from @SameerSarkar and @Copper-Head if they think descriptive variable times would have sped up their understanding of the code, and I'd like to know from @feenberg what we might be sacrificing if we were to make the switch.

feenberg commented 10 years ago

On Tue, 9 Sep 2014, MattHJensen wrote:

I'm hoping to generate a conversation about whether we should rename our variables to be descriptive in English.

We could easily provide a dictionary for variable mapping, set by default to the PUF, and in most cases we could even use the PUF variable descriptions; for instance, earnedIncForEITC instead of E59560.

That is very much not to my taste. Lines would become very long and government users would no longer have the familiar and official variable names. There would be far more continuations, which will make conditional statements even more opaque. A sum of values will have to be one value per line, a simple difference will take two lines.

The way to make users more comfortable is to make sure that they have tax forms clearly labeled with the E-codes. The correspondence between the form and the code will be clear (except for capital gains). Just giving something a nice name won't (in most cases) tie down exactly what it is in the user's mind, as many C-values are weird intermediate calculations

Having the E-Codes prepares the user for discussion with insiders and allows them to write code that will run inside the government.

Note that many economist studying an isolated tax issue don't know tax law well enough to guess what each variable is by comparing our Python code to their internal understanding of tax law; certainly most policy analysts won't. Descriptive variable names would help both of these groups understand and contribute to our code.

Moreover, if our code is readable to the uninitiated, it could be the best place for the uninitiated to learn how tax law works, a valuable contribution in and of itself.

Our code is already nearly unreadable - between the lack of an 'if' statement and the cd['...'] it is far behind SAS in readability.

dan

I'd be interested in hearing from @SameerSarker and @Copper-Head if they think descriptive variable times would have sped up their understanding of the code, and I'd like to know from @feenberg what we might be sacrificing if we were to make the switch.

— Reply to this email directly or view it onGitHub.[8114261__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcyNTkwNjU1OSwiZ GF0YSI6eyJpZCI6NDE5OTI4MjR9fQ==--cc6266be208a2ff8c704988f28b77cb424c1d75c.gif]

iliakur commented 9 years ago

I'm beginning to see eye to eye with Dan on this. Some of the C-E variables are kind of hard to give a meaningful name to outside the context of the forms they're in. We could address the opacity issue (which I'm also painfully aware of) with thorough in-line comments explaining, to the best of our knowledge, what and why gets modified (note the stress on the why part).

The dictionary key syntax must be an acquired taste, it doesn't hurt my eyes as much as all those global declarations. As for the lack of if statements, that's an issue with numpy or how we're using it. If it weren't for the arrays, I'd write a bunch of generator functions and pack all those if-statements into those. I suspect we might be missing something in numpy's functionality, though...

MattHJensen commented 9 years ago

Ok. This all makes sense. Providing a crosswalk to English in the docstring/inlines should help a lot. Down the line we may even want a machine translation utility that takes the code from C-E variable names to descriptive English variable names--but that's far off.

MattHJensen commented 8 years ago

Closing this issue. The idea of English-descriptive variable names may come up again in the future --especially as we add variables to our datafile that are not available on the IRS PUF -- but the issue is not specific or actionable now.