PSLmodels / Tax-Calculator

USA Federal Individual Income and Payroll Tax Microsimulation Model
https://taxcalc.pslmodels.org
Other
259 stars 157 forks source link

Overall inequality and poverty summary statistics #1896

Closed MaxGhenis closed 6 years ago

MaxGhenis commented 6 years ago

It would be helpful to have a function or functions to get summary statistics around inequality and poverty. These could either be part of the existing diagnostic_table function or made as new functions.

Potential metrics could include:

More challenging ones would include:

I have some code for Gini and the other inequality ones, so can work on a PR there.

For the maintainers: do these belong in diagnostic_table, or should they be separate? Also should this issue be split into multiple for each metric?

cc @evtedeschi3 who's included the Gini coefficient and other inequality metrics in some past taxcalc analyses.

codykallen commented 6 years ago

@MaxGhenis, this is an interesting idea, but I don't think Tax-Calculator is a good place to implement poverty measures. The units included come from the population of tax filers, which naturally excludes many people with little or no income, those most relevant for poverty analyses. If you're calculating a GINI coefficient, this creates the additional complication that a filing unit is neither an individual nor a household, so you would need some mechanism to either connect married couples filing separately or to split married couples filing jointly, as well as considering how to count children and non-child dependents.

That being said, if CTAM can be combined with Tax-Calculator, then the additional information from CTAM on cash and non-cash benefits could be more useful to a poverty analysis.

Also, as I've noted several times on various PRs and issues, income at the bottom of the distribution is often mismeasured.

ernietedeschi commented 6 years ago

I largely agree with Cody. I will say I think Gini coefficients at the tax unit level are fine so long as the user is clear about what they are and their implications. Inequality research varies in using individuals, families, or households as the unit of choice for statistical analysis so a Gini calculation across tax units isn’t conceptually problematic, though it may not be apples-to-apples with all of the literature.

I’ve played around with poverty analysis in my tc output before and it’s loaded with issues, many of which Cody touched on. tc doesn’t include all of the income items Census does in their absolute poverty definition. And since the SPM is partially a relative measure (based on a percentile of consumption), that adds a whole other endogenous can of worms.

One thing I did to conceptually approximate a poverty analysis was to look at the number of filers below 50% of the median of after-tax income, adjusted for tax unit size (that is, after-tax income divided by the square root of total tax unit size). This is not the Census definition of poverty but it is a common alternative measure especially in international comparative contexts such as in OECD reports. It will give a back of the envelope estimate.

On Mar 1, 2018, at 4:45 PM, codykallen notifications@github.com wrote:

@MaxGhenis https://github.com/maxghenis, this is an interesting idea, but I don't think Tax-Calculator is a good place to implement poverty measures. The units included come from the population of tax filers, which naturally excludes many people with little or no income, those most relevant for poverty analyses. If you're calculating a GINI coefficient, this creates the additional complication that a filing unit is neither an individual nor a household, so you would need some mechanism to either connect married couples filing separately or to split married couples filing jointly, as well as considering how to count children and non-child dependents.

That being said, if CTAM can be combined with Tax-Calculator, then the additional information from CTAM on cash and non-cash benefits could be more useful to a poverty analysis.

Also, as I've noted several times on various PRs and issues, income at the bottom of the distribution is often mismeasured.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-source-economics/Tax-Calculator/issues/1896#issuecomment-369741705, or mute the thread https://github.com/notifications/unsubscribe-auth/ADy7kmIWFdJV6brlnKuaRZ8GfI8LS6vAks5taGwMgaJpZM4SY77u.

feenberg commented 6 years ago

On Thu, 1 Mar 2018, evtedeschi3 wrote:

One thing I did to conceptually approximate a poverty analysis was to look at the number of filers below 50% of the median of after-tax income, adjusted for tax unit size (that is, after-tax income divided by the square root of total tax unit size). This is not the Census definition of poverty but it is a common alternative measure especially in international comparative contexts such as in OECD reports. It will give a back of the envelope estimate.

It is common, but it also tendentious in a way that the AEI would probably not like to be associated with. It is not a measure of poverty, but a measure of inequality that no amount of proportional growth can improve.

dan

On Mar 1, 2018, at 4:45 PM, codykallen notifications@github.com wrote:

@MaxGhenis https://github.com/maxghenis, this is an interesting idea, but I don't think Tax-Calculator is a good place to implement poverty measures. The units included come from the population of tax filers, which naturally excludes many people with little or no income, those most relevant for poverty analyses. If you're calculating a GINI coefficient, this creates the additional complication that a filing unit is neither an individual nor a household, so you would need some mechanism to either connect married couples filing separately or to split married couples filing jointly, as well as considering how to count children and non-child dependents.

That being said, if CTAM can be combined with Tax-Calculator, then the additional information from CTAM on cash and non-cash benefits could be more useful to a poverty analysis.

Also, as I've noted several times on various PRs and issues, income at the bottom of the distribution is often mismeasured.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-source-economics/Tax-Calculator/issues/1896#issuecomment-3697417 05, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADy7kmIWFdJV6brlnKuaRZ8GfI8LS6vAks5taG wMgaJpZM4SY77u.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.[AHvQVS3n8AUnzOHMwGUcUPdBnUwOaKEqks5taHYwgaJpZM4SY77u.gif]

ernietedeschi commented 6 years ago

My comment was meant to assist Max rather than being a suggestion for a Tax-Calculator feature. But also, as I’m sure you’re aware, one could easily argue absolute measures of poverty are tendentious too. It’s a complex debate, which punctuates Cody’s original point that this is probably beyond the purview of Tax-Calculator.

On Mar 1, 2018, at 5:45 PM, Daniel Feenberg notifications@github.com wrote:

On Thu, 1 Mar 2018, evtedeschi3 wrote:

One thing I did to conceptually approximate a poverty analysis was to look at the number of filers below 50% of the median of after-tax income, adjusted for tax unit size (that is, after-tax income divided by the square root of total tax unit size). This is not the Census definition of poverty but it is a common alternative measure especially in international comparative contexts such as in OECD reports. It will give a back of the envelope estimate.

It is common, but it also tendentious in a way that the AEI would probably not like to be associated with. It is not a measure of poverty, but a measure of inequality that no amount of proportional growth can improve.

dan

On Mar 1, 2018, at 4:45 PM, codykallen notifications@github.com wrote:

@MaxGhenis https://github.com/maxghenis, this is an interesting idea, but I don't think Tax-Calculator is a good place to implement poverty measures. The units included come from the population of tax filers, which naturally excludes many people with little or no income, those most relevant for poverty analyses. If you're calculating a GINI coefficient, this creates the additional complication that a filing unit is neither an individual nor a household, so you would need some mechanism to either connect married couples filing separately or to split married couples filing jointly, as well as considering how to count children and non-child dependents.

That being said, if CTAM can be combined with Tax-Calculator, then the additional information from CTAM on cash and non-cash benefits could be more useful to a poverty analysis.

Also, as I've noted several times on various PRs and issues, income at the bottom of the distribution is often mismeasured.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-source-economics/Tax-Calculator/issues/1896#issuecomment-3697417 05, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADy7kmIWFdJV6brlnKuaRZ8GfI8LS6vAks5taG wMgaJpZM4SY77u.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.[AHvQVS3n8AUnzOHMwGUcUPdBnUwOaKEqks5taHYwgaJpZM4SY77u.gif]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-source-economics/Tax-Calculator/issues/1896#issuecomment-369757587, or mute the thread https://github.com/notifications/unsubscribe-auth/ADy7kgqBHP4vlzpM--mfSNItWNUK5082ks5taHobgaJpZM4SY77u.

MaxGhenis commented 6 years ago

if CTAM can be combined with Tax-Calculator, then the additional information from CTAM on cash and non-cash benefits could be more useful to a poverty analysis.

@codykallen Isn't that already the case? Or are there missing benefits that would be important?

Gini coefficients at the tax unit level are fine so long as the user is clear about what they are and their implications

@evtedeschi3 I agree - tc won't match other sources, but comparing Gini across reforms in my own analysis has yielded useful and directionally expected results. Incomes could also be divided by XTOT for crude individual analysis.

tc doesn’t include all of the income items Census does in their absolute poverty definition.

Which are missing? Do we know how much they're likely to affect results? Are there plans to impute them? Do other tax analysis groups use incomplete data to estimate the impact of reforms on poverty?

since the SPM is partially a relative measure (based on a percentile of consumption), that adds a whole other endogenous can of worms.

Ah I'd missed that. Seems like a deal-breaker then, and also raises concerns @feenberg described (which are valid IMO - I prefer measuring poverty in absolute terms, and then inequality separately).

These issues (tax units vs household, lack of all income sources, mismeasurement of lowest earners) are good to consider, but it seems like they also apply to any estimation of the bottom decile or so, which tc currently supports. The question would be whether they're too distorted to report at all, or can be included with caveats. And whether certain users might end up reporting them with caveats outside of tc anyway...

MaxGhenis commented 6 years ago

It’s a complex debate, which punctuates Cody’s original point that this is probably beyond the purview of Tax-Calculator.

Does AEI have a preferred poverty measure?

ernietedeschi commented 6 years ago

@MaxGhenis wrote:

Which are missing? Do we know how much they're likely to affect results? Are there plans to impute them? Do other tax analysis groups use incomplete data to estimate the impact of reforms on poverty?

Bear in mind that the official Census money income definition used in calculating the poverty rate doesn't take taxes into account at all. It is generally a pre-tax, post-transfer measure of income. You can of course modify this to account for taxes, and many researchers do, but then it's not "official". See e.g. https://www.census.gov/topics/income-poverty/poverty/about.html

MaxGhenis commented 6 years ago

the official Census money income definition used in calculating the poverty rate doesn't take taxes into account

Doh, I misread another site on this. Never mind then.

So poverty sounds hard, except for WB extreme poverty which is affected by mismeasurement at the very bottom. Some version of SPM could possibly be done by anchoring against a particular year's thresholds, like Wimer et al (2013), but the geographic part would still require some sort of national averaging.

So maybe this could just consider inequality metrics to start?

ernietedeschi commented 6 years ago

Here's another thing you could mull over -- but this will take some playing around on your part as I'm thinking out loud here.

The cps.csv file now includes all the variables you need to link directly to the CPS ASEC: hh_seq, ffpos, and pulineno (as well as the survey year, which will be 1 + the tax year in the raw unprocessed cps.csv file).

The CPS ASEC has each family's official and SPM poverty status, as well as the relevant income and threshold measures for each.

In principle, then, you could merge each measure in and then make some assumptions about how your policy delta in tc affects them.

So for example SPM: you will have each family's SPM income and poverty threshold. What I'm basically thinking is you merge these variables in, then add in the change in after-tax income from your policy and recalculate poverty based on that.

Some big caveats here:

martinholmer commented 6 years ago

@MaxGhenis said on March 1, 2018:

It would be helpful to have a function or functions to get summary statistics around inequality and poverty.

In the first few days of March, there was an informed discussion about all the pitfalls that would have to be avoided and all the subjective judgements that would have to be made in doing this.

There has been no further discussion over the past four or five weeks. Given that there is no consensus about how to do this in the Tax-Calculator library, it seems as if calculating inequality and poverty statistics is best left up to Tax-Calculator users with an interest in such statistics. That approach allows different users to make their own judgements about how to compute the statistics.

MattHJensen commented 6 years ago

Sorry to come to this late, but I just saw something that's best for me to address:

Does AEI have a preferred poverty measure?

No, AEI does not have institutional positions. More importantly, If AEI did have an institutional position, it wouldn't be relevant to this project because the project is governed by its core maintainers, not by AEI.

As for the substance of the issue itself, I agree with @martinholmer conclusion, that:

Given that there is no consensus about how to do this in the Tax-Calculator library, it seems as if calculating inequality and poverty statistics is best left up to Tax-Calculator users with an interest in such statistics. That approach allows different users to make their own judgements about how to compute the statistics.

cc @MaxGhenis @evtedeschi3 @feenberg @martinholmer

MaxGhenis commented 5 years ago

Just saw this PSL meetup description, which looks relevant here. @evtedeschi3 are you following the approach you described in https://github.com/PSLmodels/Tax-Calculator/issues/1896#issuecomment-369903287?

In a recent working paper, Mr. Tedeschi analyzes the poverty effects of the earned basic income tax credit, a proposed expansion of the current earned income tax credit. His novel approach to estimating poverty rates uses the open-source Tax-Calculator model and the Annual Social and Economic Supplement to the Current Population Survey.

If this is the Supplemental Poverty Measure, I think this is increasingly valuable for taxcalc. For example, in January, Vox reported on research from Columbia comparing the SPM effects of 5 plans from 2020 contenders (it was their front page cover story for at least a day).

Also FYI, I've added a gini function to taxcalc_helpers, which includes weights. Here's an example notebook, and the most common usage with taxcalc is:

import taxcalc_helpers as tch
df = calc.dataframe(['aftertax_income', 's006'])  # Where calc is a taxcalc Calculator.
tch.gini(df.aftertax_income, df.s006)   # Or to zero out negatives:
tch.gini(df.aftertax_income, df.s006, negatives='zero')
ernietedeschi commented 5 years ago

The approach is in broad strokes consistent with this, though I ended up creating a synthetic 2017 CPS-based data file to run the analysis on since I wanted the latest possible SPM estimates and they are difficult to project out. Will discuss further in my presentation.

On Mar 17, 2019, at 11:10 AM, Max Ghenis notifications@github.com wrote:

Just saw this PSL meetup description, which looks relevant here. @evtedeschi3 https://github.com/evtedeschi3 are you following the approach you described in #1896 (comment) https://github.com/PSLmodels/Tax-Calculator/issues/1896#issuecomment-369903287?

In a recent working paper, Mr. Tedeschi analyzes the poverty effects of the earned basic income tax credit, a proposed expansion of the current earned income tax credit. His novel approach to estimating poverty rates uses the open-source Tax-Calculator model and the Annual Social and Economic Supplement to the Current Population Survey.

If this is the Supplemental Poverty Measure, I think this is increasingly valuable for taxcalc. For example, in January, Vox reported https://www.vox.com/future-perfect/2019/1/30/18183769/democrat-poverty-plans-2020-presidential-kamala-harris-booker-gillibrand on research from Columbia comparing the SPM effects of 5 plans from 2020 contenders (it was their front page cover story for at least a day).

Also FYI, I've added a gini https://github.com/MaxGhenis/taxcalc_helpers/blob/master/taxcalc_helpers/utils.py#L5 function to taxcalc_helpers, which includes weights. The most common usage is:

df = calc.to_dataframe(['aftertax_income', 's006']) gini(df.aftertax_income, df.s006) # Can also add negatives='zero' to zero out negative values. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PSLmodels/Tax-Calculator/issues/1896#issuecomment-473674658, or mute the thread https://github.com/notifications/unsubscribe-auth/ADy7kvbNFR4j7oxVIa4FUIH8HLIgxDg5ks5vXlrVgaJpZM4SY77u.

MaxGhenis commented 5 years ago

Thanks @evtedeschi3 for the great presentation today on calculating SPM with ASEC and taxcalc, and your paper applying this approach to an EBITC reform (could you share your slides?). You showed code at https://github.com/evtedeschi3/tcpoverty which splits the ASEC into tax units, then after running taxcalc sums up the change in after-tax income to the SPM unit for calculating the SPM rate.

Adding these capabilities natively to taxcalc would be useful. The most involved piece would be translating your tcpov2a_make_taxsim27.R program into a Python function and adding documentation, and also the more straightforward piece of re-aggregating to SPM units and calculating SPM features.

This tax unit script is a lot simpler than the taxdata SAS scripts so I'm guessing it misses some things, but I also spoke with a Columbia poverty researcher who was doing something similar with taxcalc/ASEC, so I think it's worthwhile to have the flexibility of inputting your own ASEC. I'll need this for my own research, so if taxcalc/taxdata maintainers would prefer I can add it to my taxcalc_helpers package instead.

I'll be trying to run @evtedeschi3's process in Python in the next few days and report back how it goes.

ernietedeschi commented 5 years ago

Very kind of you @MaxGhenis. I've added the slides to that repository: psl_presentation_v2.pdf

Taking a step back, I think a useful first question would be "What is the goal of 'integration' here?" These scripts are relying on data outside of what Tax-Calculator currently makes available, namely the 2018 CPS ASEC.

So there are many different "levels" of changes that would automate what I did to different extents.

Off the top of my head, the most simple approach would be to modify the CLI so that a single run will produce a dump with tax changes (rather than having to run a base and then a reform sim). And then at the same time, streamline/automate the process for taking a more recent CPS ASEC than what's used in the cps.csv file and converting it into a data file readable by Tax-Calculator.

That would allow a user to more quickly create a simulation off of a recent ASEC that she could then manually re-merge back into the ASEC and tabulate in the manner I did.

The more complicated approach would be to fully integrate poverty output into Tax-Calculator. There might be a way to do this that just involves including the SPM unit, SPM weight, SPM threshold, and SPM resource variables into the cps.csv and then automating how they're tabulated after a reform. But it requires some thought because the SPM poverty rate is measured as a percent of all people, not a percent of families or tax units. Also, if I recall correctly, the current cps.csv only draws on data from the 2013-15 ASECs; we have three newer years that researchers will likely want to be able to access for poverty estimates. And as I mentioned in the presentation, the assumptions become even more complex once we start talking about projecting multi-year SPM poverty estimates versus single year historical counterfactual poverty estimates.

MaxGhenis commented 5 years ago

streamline/automate the process for taking a more recent CPS ASEC than what's used in the cps.csv file and converting it into a data file readable by Tax-Calculator.

I think this is the key part. Modifying the CLI to produce tax changes sounds worthwhile regardless of whether one is creating poverty statistics or other tax analysis.

Something like this is what I'd like to be able to do from the Python API (could translate to CLI):

asec = pd.read_csv('asec.csv')
recs = tc.create_asec_tax_units(asec)
base = tc.Calculator(recs, tc.Policy())
# Same for reforms, plus advance_to_year(), calc_all(), etc.
# Get change in disposable income per tax unit
comp = tc.compare(base, reform)
# Aggregate change in disposable income to the SPM unit using the original ASEC
# Also adds a column for `new_spm_resources`
comp_spm = tc.agg_spm(comp, asec)
tc.spm_rate(comp_spm)  # Calculate SPM rate for baseline and reform.