PSLmodels / Tax-Calculator

USA Federal Individual Income and Payroll Tax Microsimulation Model
https://taxcalc.pslmodels.org
Other
257 stars 156 forks source link

Allow non-puf data sources #319

Closed MattHJensen closed 7 years ago

MattHJensen commented 9 years ago

Taxcalc currently requires an extrapolated dataset based off of the IRS public use file (PUF), in particular the 2008 -- soon 2009 -- PUF. This dependency on the PUF may prevent many potential users or contributors from getting involved with the project. Therefore, we should endeavor to allow the use of non-puf data sources. Currently we have three ideas for doing so.

MattHJensen commented 9 years ago

The first option above is the priority and will probably be implemented first. The second option is currently underway, at least tangentially, as part of John's work on the CPS-PUF match. @mmessick has investigated the third option and has suggested a change to records.py (#309) that we think might help with both option one and option three, but beyond that no one is working to implement the third option.

feenberg commented 9 years ago

This is very helpful. I have a couple of comments:

Users with their own data:

Accepting taxsim 22 field files would be nice, but I would hope that we could have a more general solution that allowed for more fields if users have them.

In the SAS code only filing period and marital status are required fields. Any omitted fields are taken as zero. This is very convenient for users with subsets of data, and not too hard to achieve. It mirrors the IRS treatment of blanks on tax forms. I am not sure if the OSPC code is entirely compatible with this, but I expect it is nearly so. That way users can supply the fields they have and ignore the ones they are missing.

There isn't much in the CPS that goes beyond the 22 taxsim fields, but other survey do, so it would be good to allow for additional fields.

Most users will have only a few fields (cf Internet taxsim with only 22 fields) and making them prepare a file with 200 fields would be a nuisance. I would suggest we allow a simple csv spreadsheet input. First line of field names, subsequent lines for individual taxpayers. This is compact and can be output by any stat package. While CSV files are weakly standardized, the problem areas are all about character strings with embedded oddities, etc. Nothing that would affect us is problematic. I would implore you not to assume that everyone is working in Python or R every day. In fact the economicst users are mostly working in Stata or SAS and would welcome an easy transition. Most non-economist users are working in Excell or Access (more's the pity) and they also have the ability to create CSV files easily.

Should we offer a prepared CPS file? I do think that would have advantages for many users studying EIC or other features not related to rich people. The CEX is another file of great interest. These are to some extent substitutes for imputations. Personally I believe they are superior because they don't promise more than they deliver. This is non-trivial work and should happen after the web calculator is up and running well.

As for an individual interested in a single, perhaps personal, calculation I wonder if we want to tie to form line numbers (Line 11, form 1040), as Luca does, or to economic concepts, such as E00200 (Wages and salaries) for input. I don't think we want to be a slightly buggy Taxcut competitor. However, if the Luca software can print tax forms from the PUF or complete data, that would be a fantastic debugging aid. There will be a problem when PUF fields do not correspond precisely to E codes.

dan

On Sat, 18 Jul 2015, Matt Jensen wrote:

Taxcalc currently requires an extrapolated dataset based off of the IRS public use file (PUF), in particular the 2008 -- soon 2009 -- PUF. This dependency on the PUF may prevent many potential users or contributors from getting involved with the project. Therefore, we should endeavor to allow the use of non-puf data sources. Currently we have three ideas for doing so.

  • Allow users to input simplified tax records from a TAXSIM input file.
    • Advantages of this technique are that TAXSIM input files are well designed, easy to create, and many users already have them. It would be an easy and maybe even familiar way for new users to try out taxcalc. It would also allow for easier testing between calculators that accommodate TAXSIM input files.
    • This would likely not use the data extrapolation, at least to begin with, and a challenge will be accommodating a file with multiple years of data.
    • @martinholmer proposed this in #291 and is currently working on it.
  • Create a file based off of the CPS that could be used as an alternative to the PUF or PUF-CPS match file.
    • An advantage of this file is that it could form the base for an extrapolated, nationally representative, file for revenue estimation.
    • John O'Hare is helping us with this as part of the PUF-CPS match project.
  • Allow for individuals to enter their own detailed tax information and see how their taxes would change under various reform proposals.
    • An advantage of this approach is that it would make taxcalc interesting to anyone who cares about his own finances but not necessarily about the national implications of tax policy.
    • We believe the JSON tax form completion feature of the Luca project could serve as a good front end for allowing users to input their tax data. See more on that, here. The challenge is to pull the data from the JSON tax forms into taxcalc, essentially swapping out the Luca tax calculator.
    • @mmessick has suggested a revision to our records class that would make it easier to integrate with the JSON tax forms in #309.

— Reply to this email directly or view it on GitHub.[AHvQVck8Rwd-P9CWd3lE9RKqi-c9l4-aks5oej57gaJpZM4FbNcy.gif]

martinholmer commented 8 years ago

@MattHJensen, Given everything that has happened since last July, would you say that issue #319 has been resolved? If so, it is probably safe to close it. If not, then probably best to close it and raise a new issue that focuses on what remains to be done.

martinholmer commented 8 years ago

Martin said on March 8, 2016:

Matt, Given everything that has happened since last July, would you say that issue #319 has been resolved? If so, it is probably safe to close it. If not, then probably best to close it and raise a new issue that focuses on what remains to be done.

Seem as if this issue #319 has been resolved. Shall I close it?

@MattHJensen

martinholmer commented 8 years ago

@MattHJensen, Have you thought about the following question?

Seems as if issue #319 has been resolved; shall I close it or do do want to close it?

MattHJensen commented 8 years ago

@martinholmer, this is the only issue that documents the second two check marks (CPS and LUCA). I have been leaving the issue open until we can check those off. Do you think they should be separate issues?

martinholmer commented 8 years ago

@MattHJensen said:

This is the only issue that documents the second two check marks (CPS and LUCA). I have been leaving the issue open until we can check those off. Do you think they should be separate issues?

Oh, yes I see your point about the three-pronged plan in your original 18-Jul-2015 entry. So, I guess you could leave it open, but it is fairly out-of-date. We can input data other than those in the puf.csv and have been finished with the first-prong effort (inputting Internet-TAXSIM formatted data) for some time.

It's up to you. All depends on whether or not you want to give an update about what has happened since you raised this issue in July 2015.

MattHJensen commented 8 years ago

So, I guess you could leave it open, but it is fairly out-of-date. We can input data other than those in the puf.csv and have been finished with the first-prong effort (inputting Internet-TAXSIM formatted data) for some time.

A great deal has been completed on this issue since it was opened in 18-Jul 2015.

The first objective was completed:

Relatedly:

John O'Hare are working on the second objective:

Create a file based off of the CPS that could be used as an alternative to the PUF or PUF-CPS match file.

and @amy-xu will distribute the first version of the file to the team when she receives it from John.

@zrisher is currently working on the third objective:

Allow for individuals to enter their own detailed tax information and see how their taxes would change under various reform proposals.

martinholmer commented 8 years ago

@MattHJensen, Thanks for the useful progress report on issue #319.

zrisher commented 8 years ago

@MattHJensen @martinholmer If you like, I'm happy to create a new issue for:

Allow for individuals to enter their own detailed tax information and see how their taxes would change under various reform proposals.

There's a lot of related info floating around in this repo's issues. It would be good to link it from one place, centralize the discussion, and get this issue one step closer to being closed.

martinholmer commented 8 years ago

@zrisher said:

@MattHJensen @martinholmer If you like, I'm happy to create a new issue for:

Allow for individuals to enter their own detailed tax information and see how their taxes would change under various reform proposals.

There's a lot of related info floating around in this repo's issues. It would be good to link it from one place, centralize the discussion, and get this issue one step closer to being closed.

This sounds like a good idea to me. What do you think, @MattHJensen?

MattHJensen commented 8 years ago

That makes sense to me too.

zrisher commented 8 years ago

Ok, Point 3 has been moved to #851:

Allow for individuals to enter their own detailed tax information and see how their taxes would change under various reform proposals.

All that remains tracked under this issue is Point 2:

Create a file based off of the CPS that could be used as an alternative to the PUF or PUF-CPS match file.

MattHJensen commented 8 years ago

@zrisher and @martinholmer, would you suggest I edit the original comment by deleting Point 3?

zrisher commented 8 years ago

@MattHJensen I would suggest you create a new issue for Point 2 and close this one as complete. :smile:

zrisher commented 7 years ago

@MattHJensen @martinholmer @talumbau This issue can now be closed as the remaining work is tracked in separate issues.

martinholmer commented 7 years ago

Closing Tax-Calculator issue #319 per @zrisher comment.