How should distribution/difference tables handle those with negative income?

martinholmer commented 6 years ago

Originally in taxdata issue 143 @MaxGhenis said this:

@martinholmer thanks for the quintile graph function! I updated the notebook to use taxcalc 0.16.0 with benefits (which I'm extremely excited about!!) and look into the bottom quintile. Here's how the results shifted (I didn't calculate quintiles in the original analysis):

Metric	Without benefits (tc 0.15)	With benefits (tc 0.16)
Share of bottom decile with negative after-tax income	1.4%	1.1%
Share of bottom decile with zero after-tax income	37.8%	0.75%
After-tax income of tax units with negative after-tax income	-$23.8B	-$21.8B
After-tax income of bottom decile with positive after-tax income	$21.0B	$78.4B
After-tax income of full bottom decile	-$2.8B	$56.6B
Negative tax units' reduction to bottom decile's after-tax income, relative to omitting	-113%	-28%
After-tax income of bottom quintile with positive after-tax income	-	$334B
After-tax income of full bottom quintile	-	$312B
Change in bottom quintile's after-tax income from negative tax units, relative to omitting	-	-6.5%

Given these negatives still affect the bottom decile by 28%, it seems like this may at least warrant a caveat in TaxBrain or something. Is there a source or justification for the decision of other tax analysis groups to include them? I couldn't find anything from TPC, TF, or CBO with a quick search.

Certain use cases will also justify omitting or zeroing out negatives, for example calculating the Gini coefficient. Users could spin up their own code, but it might be worth some additional taxcalc code to standardize this at some point.

martinholmer commented 6 years ago

Originally in taxdata issue 143 @codykallen said this:

@MaxGhenis mentioned:

I couldn't find anything from TPC, TF, or CBO with a quick search.

Tax Policy Center omits those with negative income from distributional analyses but includes them in totals. From the footnotes to their distributional tables:

Tax units with negative adjusted gross income are excluded from their respective income class but are included in the totals.

JCT excludes such taxpayers as well. From the footnotes to their distributional tables:

Individuals who are dependents of other taxpayers and taxpayers with negative income are excluded from the analysis.

Tax Foundation's approach is not so clear in their publications, but a footnote from a 2009 TF working paper says,

Negative income famillies excluded from bottom quintile but included in totals.

CBO does this too. According to their report, "The Distribution of Household Income and Federal Taxes, 2013" (published August 2016):

If a household has negative income (that is, if its business or investment losses are larger than its other income), it is excluded from the lowest income group but included in totals.

martinholmer commented 6 years ago

@codykallen, Thank you for correcting my misunderstanding of how other tax analysis groups handle filing units with negative income in their distributional tables. But the quotes you provide from those groups' publications raise, in my mind, more questions than they answer.

What exactly does it mean to "drop" filing units with negative income from the distributional table? Where in the process of constructing the table are negative-income units dropped?

It makes no sense (to me) to drop them before constructing the quintiles or deciles. If you do that you are arbitrarily shifting every filing unit's location in the quintile/decile distribution. The units with negative income are a fact of life and should be placed in the lowest income groups it seems to me. Do the other groups actually drop those with negative income before constructing the quintiles/deciles? If so, what's the rationale for doing that?

Or perhaps the quotes mean that negative-income units are dropped in the calculation of the quintile/decile statistic (for example, the percentage change in after-tax expanded income statistic that started this whole discussion in @MaxGhenis' issue #1806). Is that what the other tax analysis groups do? If that is what they do, what rationale do they provide for doing that?

And the biggest question in my mind is about the practice of somehow "dropping" negative-income filing units from the quintile/decile statistics but yet including them in the whole sample statistics. If including negative-income units in quintile/decile statistics is somehow undesirable, why are those with negative incomes included in the whole-sample statistics? This strikes me as consistent logic.

codykallen commented 6 years ago

@martinholmer asked several questions:

What exactly does it mean to "drop" filing units with negative income from the distributional table? Where in the process of constructing the table are negative-income units dropped?

From what they've disclosed, this isn't entirely clear. It appears that they may include these people when determining the quintiles or deciles but exclude them when calculating changes in their tax liabilities or after-tax incomes.

Do the other groups actually drop those with negative income before constructing the quintiles/deciles? If so, what's the rationale for doing that?

It appears that they probably keep them when determining the cutoffs for each quintile or decile, but drop them when calculating totals or averages within each quintile or decile. However, if they do drop these individuals entirely before running the calculations, it is because these individuals' incomes are mismeasured. As an example (sorry for the politics), Donald Trump wrote off a $916 million loss in 1995, which he could carry forward for the next 18 years to offset any positive income (thus having zero or nearly zero expanded income over the next 18 years). If he was included in our sample, he would be counted as in the bottom decile. In other words, people who write off large losses to have zero or negative expanded income are not actually poor or low-income and do not belong in the bottom decile. This is also why they could be included in the aggregate totals but not the distributional analysis. They still count as tax filers, but they are not low-income and should not be included in the bottom decile.

As you've noted, the statements I quoted are less than clear. But none of their models are open source, and few disclose any details beyond occasional footnotes.

martinholmer commented 6 years ago

@codykallen, Thanks for your thoughts on the vexing problem of handling filing units with negative income.

While I'm sympathetic to your characterization of this problem as one of mismeasured income, I'm don't think will ever be able to derive from the data we are using in Tax-Calculator a credible present-value of lifetime (past and future) income statistic that we could use to assign filing units to lifetime income quintiles or deciles or dollar bins. That would be conceptually sensible, but I don't see it as a practical possibility.

And remember this mismeasurement problem is widespread. Consider the elderly couple whose only income is their social security benefits but have ten million dollars in an IRA invested in tax-free bonds. In our data, they are going to be placed in an income percentile that is way below their "true" lifetime income. Or consider someone who experienced a long spell of unemployment; that person's annual income in our data is also well below the person's "true" lifetime income. I don't see how we will ever be able to sort filing units by their "true" lifetime income.

So, then the question becomes what to do with the annual tax-related income data we do have.

One way to think about this problem, is that we don't want those with negative income distorting subgroup statistics like the percentage change in after-tax expanded income. But if a tax analyst wanted to look at the distribution of the dollar change in after-tax expanded income across subgroups, there would be no problem in showing that statistic for each filing units (because there's no dividing by a non-positive income to get the percentage change). This is why I'm reluctant to drop filing units with negative income. The negative incomes don't cause a problem when you don't use them as a divisor.

So, what do you think about the following approach to calculating the percentage change in after-tax expanded-income statistic by income subgroups? Instead of deciles or quintiles, we could compute this statistic for each baseline expanded-income percentile (that is, 100 equal-sized subgroups), but show in the table or graph only the percentiles that contain no filing units with negative expanded income in the baseline.

This is exactly what TaxBrain does with the dollar income bins. Tax-Calculator computes statistics for all the bins (including the lowest bin containing those with negative expanded income), but TaxBrain does not show that bottom bin. (We are in the process of fixing the labeling of these TaxBrain tables as discussed in #1889.)

Also, this is the approach taken by the average tax rate graph generated by the tc tool or by any Python script that calls the Calculator.atr_graph method. The average tax rate statistic can be computed for the percentiles with negative expanded income but that statistic is misleading for those subgroups. So, the graph simply does not plot the statistic for those few percentiles. You can see this by comparing the marginal tax rate (MTR) graph with the average tax rate (ATR) graph. Here are those two graphs from the user documentation.

First the MTR graph which plots all of the percentiles:

mtr

And now the ATR graph which does not plot a few of the low percentiles:

atr

@MaxGhenis

martinholmer commented 6 years ago

@MaxGhenis, we are interested in your thoughts on the approach described in this comment. Does this approach, which has been implemented in pending pull request #1890, seem sensible to you? If not, how should #1890 be modified to make it better?

MaxGhenis commented 6 years ago

The lifetime income question is an interesting one, and I agree that percentile plots can address some of the problem and are more useful in some circumstances, negative income aside. For example, UBI reforms have a more significant relative effect for, say, the bottom 5%.

But to be clear, @martinholmer are you suggesting removing all equal-frequency binning aside from percentiles? For better or worse, deciles and quintiles are commonly reported from other tax analysis groups and in the media, so I don't think this is a tenable position. They give a single number or small set of numbers that can be easily absorbed. Given percentiles only, one could report the average change over percentiles in the bottom decile, but then we're basically back where we started.

The approach that makes most sense to me is including all tax units when defining quantiles, and then dropping negatives from calculations involving such quantiles.

@martinholmer minor question on pch_graph: Does this drop percentiles with all negatives, or with any negatives?

We may also want to think about zeros, as to avoid infinite percentage changes. This isn't an issue now with only 0.75% having zero given benefits, but one could imagine comparisons of chained reforms that introduce this problem.

martinholmer commented 6 years ago

@MaxGhenis asked:

are you suggesting removing all equal-frequency binning aside from percentiles?

No, I don't think I ever suggested that. In fact, the quintile and decile graphs of percentage change in income are still part of the Tax-Calculator library and so are the decile distribution and difference tables. The documentation for those graphs and tables have been revised to say that they include filing units with negative and zero income.

martinholmer commented 6 years ago

@MaxGhenis said:

The approach that makes most sense to me is including all tax units when defining quantiles, and then dropping negatives from calculations involving such quintiles.

Fine. Because Tax-Calculator is an open-source project, you have the complete freedom to do that.

martinholmer commented 6 years ago

@MaxGhenis asked:

minor question on pch_graph: Does this drop percentiles with all negatives, or with any negatives?

A percentile with "any negatives [or zeros]" is not shown in the graph.

MaxGhenis commented 6 years ago

OK could you clarify the approach you're suggesting / asking for feedback on? If it's only the offering of the new pch_graph function, SGTM. I was responding to this paragraph, which seemed to suggest removing deciles and quintiles:

So, what do you think about the following approach to calculating the percentage change in after-tax expanded-income statistic by income subgroups? Instead of deciles or quintiles, we could compute this statistic for each baseline expanded-income percentile (that is, 100 equal-sized subgroups), but show in the table or graph only the percentiles that contain no filing units with negative expanded income in the baseline.

Re: pch_graph:

A percentile with "any negatives [or zeros]" is not shown in the graph.

The logical extension of this would be not showing the bottom decile and quintile, since they include any negatives. Would it be confusing to have different logic for reporting percentiles vs. quintiles and deciles? If a goal is consistency across reporting of quantiles, I think there are four options:

Remove negatives within each quantile/bin, thereby nulling out bins with zero non-negative tax units. Nulls out only bottom 1% of percentile charts.
Remove bins where the aggregate is negative. Probably nulls out bottom 2% of percentile charts.
Don't remove any tax units and just warn users (approach taken for quintile and decile charts, would produce misleading percentile charts).
Remove bins with any negatives. Current approach for pch_graph but would also involve removing bottom decile and quintile.

feenberg commented 6 years ago

On Wed, 21 Feb 2018, Martin Holmer wrote:

@MaxGhenis said:
  The approach that makes most sense to me is including all tax
  units when defining quantiles, and then dropping negatives from
  calculations involving such quintiles.
Fine. Because Tax-Calculator is an open-source project, you have the complete freedom to do that.

But does seem like the sort of improvement that should be welcomed by the maintainers, otherwise it could stimulate a fork. Mixing negative income taxpayers (with loss carryforwards) with other low income taxpayers doesn't produce an informative distribution table.

dan feenberg

martinholmer commented 6 years ago

@MaxGhenis and @feenberg, Thanks for your comments in issue #1888.

But as the discussion leading up to @codykallen's comment suggests, there is no "consistency" or clarity in what other tax analysis groups do with filing units with negative income.

And, Dan, can you explain where we should place "taxpayers with loss carryforwards" in the income distribution? And are your sure that "taxpayers with loss carryforwards" are the only filing units with negative income?

martinholmer commented 6 years ago

@feenberg said in issue #1888:

Mixing negative income taxpayers (with loss carryforwards) with other low income taxpayers doesn't produce an informative distribution table.

Dan, you're focusing on only one group of filing units with negative income. What about a single individual who is running a struggling small business and reports a negative Schedule C net income? Isn't it plausible to think this person is part of the group you call "other low income taxpayers"? Why should we ignore this individual?

But more broadly, I don't understand why we are in this situation many years into the project. If you feel so strongly about this issue, why didn't you raise this issue at the very beginning of the project when the distribution and difference tables were first introduced into the Tax-Calculator library?

MattHJensen commented 6 years ago

Very good discussion of the substance in this issue. It is welcome to be able to pay more attention to implementation details like these as the project matures and contributors and users have the leisure to zoom in.

feenberg commented 6 years ago

On Wed, 21 Feb 2018, Martin Holmer wrote:

@MaxGhenis and @feenberg, Thanks for your comments in issue

But as the discussion leading up to @codykallen's comment suggests, there is no "consistency" or clarity in what other tax analysis groups do with filing units with negative income.

And, Dan, can you explain where we should place "taxpayers with loss carryforwards" in the income distribution? And are your sure that "taxpayers with loss carryforwards" are the only filing units with negative income?

I would exclude taxpayers with loss carryforwards or exclude losses from income. One or the other is highly desirable. Do you want to put Trump's 800 million dollar loss in the table? Doesn't that distort the resources available to the bottom quintile?

I would be interested in the source of losses in our data. Do we cap capital losses at $3,000?

dan

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.[AHvQVQRQn1AWP6Ftya7wkEvXEolgUepAks5tXIVkgaJpZM4SJVHD.gif]

codykallen commented 6 years ago

@feenberg asked:

I would be interested in the source of losses in our data. Do we cap capital losses at $3,000?

Pre-TCJA, the only losses capped were investment losses, if the sum of short-term capital gains (p22250) and long-term capital gains (p23250) was less than -3000. We did not cap business losses (through e26270 and e00900). The TCJA capped total business losses at $250,000 (or $500,000 for married joint filers). However, we still count those business losses in excess of the cap when determining expanded income.

martinholmer commented 6 years ago

@MaxGhenis said:

The logical extension of this would be not showing the bottom decile and quintile, since they include any negatives. Would it be confusing to have different logic for reporting percentiles vs. quintiles and deciles? If a goal is consistency across reporting of quantiles, I think there are four options:

Remove negatives within each quantile/bin, thereby nulling out bins with zero non-negative tax units. Nulls out only bottom 1% of percentile charts.

Remove bins where the aggregate is negative. Probably nulls out bottom 2% of percentile charts.

Don't remove any tax units and just warn users (approach taken for quintile and decile charts, would produce misleading percentile charts).

Remove bins with any negatives. Current approach for pch_graph but would also involve removing bottom decile and quintile.

Pull request #1890 now implements your option 4. The new default behavior of the decile_graph and quintile_graph methods is to not show the bottom decile/quintile result. A curious user of these methods has the option of generating a graph that does show results for the bottom decile or quintile. Here is an example of what the default decile_graph method produces now:

screen shot 2018-02-22 at 11 43 18 am

MaxGhenis commented 6 years ago

The default behavior of the decile_graph and quintile_graph methods is to not show the bottom decile/quintile result.

Will this also be the default view in TaxBrain? This seems severe to me.

To capture the several strands to this discussion, I put the latest info in this document, to which all have edit access. This includes info on what other analysis groups, what leads to negatives, their impact on CPS data, and options for Tax-Calculator.

I also added two options in addition to my initial four:

Remove negatives from entire analysis, including totals. This is what JCT does.
Remove negatives from quantile calculation and binning, but include in totals. This is an alternate interpretation of what TPC, TF, and CBO do, though after re-reading their footnotes I think they almost certainly do option 1 (remove negatives within each quantile/bin).

Leading to this overall table:

screenshot 2018-02-22 at 11 42 51

I also realized that the table with shares of tax units above is mistakenly labeled as after-tax income, instead of expanded income. Since after-tax income is also relevant, I added that info to the table (not much different). This also corrects a number with zero income, and makes the top two rows shares of total instead of bottom decile, since we're now concerned with various quantiles.

screenshot 2018-02-22 at 11 44 10

Feel free to edit, comment, or suggest in the doc. I'll also keep it updated for reference to reflect discussion in this issue.

martinholmer commented 6 years ago

@MaxGhenis asked in issue #1888:

The default behavior of the decile_graph andquintile_graph` methods is to not show the bottom decile/quintile result.

Will this also be the default view in TaxBrain? This seems severe to me.

I have no idea; as of now TaxBrain shows no graphs generated by Tax-Calculator.

If it seems "severe", why did you list it as a sensible option about how to handle this issue?

MaxGhenis commented 6 years ago

If it seems "severe", why did you list it as a sensible option about how to handle this issue?

I listed all potential options to be comprehensive, and included this because you had already enacted it for percentiles. I've made clear my preference for aligning with TPC, CBO, and TF on option 1, and thought it self-evident that discarding 20% of tax units, because 0.1% are problematic, would be basically a non-starter.

At this point the maintainers need to make a decision, a process I'm not familiar with when the solution is not obvious. FWIW I've found collaborative documents and meetings more productive than long back-and-forth GitHub/request threads when dealing with complex design challenges that require consensus. I'd be curious what this process typically looks like for Tax-Calculator, though of course this is outside my realm and I don't want to step on any toes.

MaxGhenis commented 6 years ago

I have no idea; as of now TaxBrain shows no graphs generated by Tax-Calculator.

Is there a goal of consistency between TaxBrain views and default taxcalc views? Is TaxBrain behavior in scope of this issue?

martinholmer commented 6 years ago

@MaxGhenis asked:

Is TaxBrain behavior in scope of this issue?

Probably not because the topic is highly conjectural, seeing that right now TaxBrain does not display any graphs generated by Tax-Calculator.

MaxGhenis commented 6 years ago

TaxBrain does not display any graphs generated by Tax-Calculator.

What about TaxBrain's difference tables? Shouldn't the decision for the graphs also apply to the tables?

martinholmer commented 6 years ago

@MaxGhenis said in issue #1888:

I've made clear my preference for aligning with TPC, CBO, and TF on option 1

and then Max described option 1 this way:

Option 1: Remove negatives within each quantile/bin, and removing quantiles/bins with only negatives.

Why don't you prepare a pull request that does this so that we can better assess the pros and cons of implementing this approach? With this issue everything is in the implementation details.

As I've said before, the TPC and CBO approach of dropping negative income units from the bins/quantiles but including them in the totals is logically inconsistent and is almost certainly going to lead to confusion among users for that reason.

And another concern I have about this approach is that it introduces distortions (that most users would consider bugs) in other statistics in the distribution/difference tables (other than the percentage change in after-tax expanded income). So, for example, consider a UBI reform that gives every person in a filing unit $10,000 per annum tax-free. If we implement your option 1, users are going to start complaining, with good reason, about the tables not making sense. Users will say: "When I sum the product of XTOT and $10,000 for each filing unit in each quantile, I don't always get the dollar change in after-tax expanded income that is in the Tax-Calculator table." So, what are we supposed to say to these users?

martinholmer commented 6 years ago

@MaxGhenis asked:

What about TaxBrain's difference [and distribution] tables? Shouldn't the [how-to-handle-negative-incomes] decision for the graphs also apply to the tables?

Yes, that would seem reasonable.

codykallen commented 6 years ago

@martinholmer said:

the TPC and CBO approach of dropping negative income units from the bins/quantiles but including them in the totals is logically inconsistent.

No, it isn't. Those who write off large business losses and end up with negative income are not actually low-income. Although they file taxes (and thus belong in the totals), one cannot reasonably ascertain where they should realistically fall within the income distribution. Very few of them, if any, belong in the lowest quintile (or decile, or percentile), but they cannot accurately be placed into any other income bin.

martinholmer commented 6 years ago

@codykallen said:

the TPC and CBO approach of dropping negative income units from the bins/quantiles but including them in the totals is logically inconsistent.

No, it isn't. Those who write off large business losses and end up with negative income are not actually low-income. Although they file taxes (and thus belong in the totals), one cannot reasonably ascertain where they should realistically fall within the income distribution. Very few of them, if any, belong in the lowest quintile (or decile, or percentile), but they cannot accurately be placed into any other income bin.

I can see your line of argument and while everything you say makes sense, it is still true that the parts do not add up to the total. At some level that is a logical inconsistency.

codykallen commented 6 years ago

@martinholmer said:

it is still true that the parts do not add up to the total

What if we add a bin for "unallocated tax units" or "undistributed tax units"? And add an asterisk or footnote saying "These tax units have negative income due to large business losses. They are included in the totals but excluded from the distributional analysis."?

martinholmer commented 6 years ago

@codykallen said in issue #1888:

it is still true that the parts do not add up to the total

What if we add a bin for "unallocated tax units" or "undistributed tax units"? And add an asterisk or footnote saying "These tax units have negative income due to large business losses. They are included in the totals but excluded from the distributional analysis."?

This is a constructive suggestion, thanks! I've got a few questions.

You have good ideas for the row label, but what, if any, statistics do we show on that row?
What do you mean by "large business losses"? How would we operationalize that concept? In particular, what's your answer to the following question I posed to Dan earlier in this discussion?

Dan, you're focusing on only one group of filing units with negative income [those with large loss carryforwards]. What about a single individual who is running a struggling small business and reports a negative Schedule C net income? Isn't it plausible to think this person is part of the group you call "other low income taxpayers"? Why should we ignore this individual [by excluding that individual from the lowest quantile]?

MaxGhenis commented 6 years ago

I've thought of option 1 as essentially having this extra row as @codykallen describes, and then hiding it because it conveys little useful information. Do 0.1% of tax units deserve the same real estate as an entire quintile in these charts?

codykallen commented 6 years ago

@martinholmer asked:

You have good ideas for the row label, but what, if any, statistics do we show on that row?

I think we should show totals and differences, but leave blank the entries for anything percentage-related (such as average tax rates). Or put - or * instead of a blank.

What do you mean by "large business losses"? How would we operationalize that concept?

"These tax units wrote off losses that exceed their other sources of income, resulting in negative income. As a result, these filers cannot be accurately placed within the income distribution. They are included in the totals but excluded from the income bins."

MaxGhenis commented 6 years ago

What do you mean by "large business losses"? How would we operationalize that concept?

@martinholmer haven't you already operationalized it in pch_graph, as either tax units with negative baseline expanded income or negative baseline after-tax income? They should be similar, but between the two, negative after-tax income seems more appropriate given it's the denominator for %change graphs there and thus causes the sign inversion problem. Another option would be tax units with either negative expanded income or negative after-tax income.

martinholmer commented 6 years ago

@codykallen said in issue $1888:

You have good ideas for the row label, but what, if any, statistics do we show on that row?

I think we should show totals and differences, but leave blank the entries for anything percentage-related (such as average tax rates). Or put - or * instead of a blank.

That seems like a reasonable approach on first thought.

And @codykallen also said:

What do you mean by "large business losses"? How would we operationalize that concept?

"These tax units wrote off losses that exceed their other sources of income, resulting in negative income. As a result, these filers cannot be accurately placed within the income distribution. They are included in the totals but excluded from the income bins."

OK, but how do we operationalize "wrote off losses that exceed their other sources of income"? For example, on what row would the filing unit I asked Dan about (which has no other income other than a negative Schedule C income) be classified?

codykallen commented 6 years ago

For example, on what row would the filing unit I asked Dan about (which has no other income other than a negative Schedule C income) be classified?

I would suggest this this filing unit be included in the "unallocated" bin. Although I wonder how many filers this applies to.

martinholmer commented 6 years ago

@codykallen, Here's my (perhaps incorrect) understanding of the proposal you're making:

Everybody with negative expanded income will be classified in a unclassified 
group (for which we show all statistics except for the percentage-change-in-
aftertax-income statistic) and the remaining filing units will be placed in the 
bins or quantiles.  So, if 2 percent of the filing units have negative income, 
then in the tables that use deciles the remaining 98 percent of the filing units
would be sorted into ten equal-sized groups (that is, with 9.8 percent of the
total sample in each decile).

Is that a correct understanding of your proposal?

codykallen commented 6 years ago

@martinholmer, that is a correct interpretation of my proposal.

MaxGhenis commented 6 years ago

So, if 2 percent of the filing units have negative income, then in the tables that use deciles the remaining 98 percent of the filing units would be sorted into ten equal-sized groups (that is, with 9.8 percent of the total sample in each decile).

This maps to option 6 in my list, not option 1, which is the option I believe CBO and TPC have taken. In CBO/TPC's option 1, deciles are true deciles and negatives are removed from the bottom decile. Seems fine (probably doesn't change results much) but just confirming you want to deviate?

Also curious what you think about using baseline after-tax income instead of baseline expanded income? Using expanded income still introduces a (small) potential for sign inversion when calculating change to after-tax income.

codykallen commented 6 years ago

@MaxGhenis said:

This maps to option 6 in my list, not option 1, which would make deciles true deciles and remove the negatives from the bottom decile. I don't believe it's what CBO and TPC do. Seems fine but just confirming you want to deviate?

Suppose instead that we kept the tax units when determining the cutoffs for each bin, but dropped them from the actual bins. Then the lowest bin would be smaller than the others, so a distribution by decile would have one bin with 8% of filing units and 9 bins with 10% of units each.

@MaxGhenis also asked:

Also curious what you think about using baseline after-tax income instead of baseline expanded income? Using expanded income still introduces a (small) potential for sign inversion when calculating change to after-tax income.

Yes, I prefer using baseline expanded income as the income measure for sorting into bins. Although if any column involves dividing by income, the cell for the "unallocated" group should be left empty.

MaxGhenis commented 6 years ago

Yes this is a trade-off between a slightly smaller bottom bin and quantiles that aren't true quantiles. How important is it that when we describe the upper decile it's actually the upper decile? Neither seems inherently better or worse to me, but it does seem noteworthy that three of the four groups you investigated appear to use the same methodology.

Yes, I prefer using baseline expanded income as the income measure for sorting into bins.

FYI @martinholmer @feenberg and I are discussing this over at #1893.

Although if any column involves dividing by income, the cell for the "unallocated" group should be left empty.

Right, my concern is that in using expanded income, %chg to after-tax income still divides by something that could be negative, in non-unallocated bins. Consider a tax unit with $1k expanded income and -$1k after-tax income (warrants investigation to determine prevalence), and $0 after-tax income under the reform. This tax unit would not be unallocated, since they have positive expanded income, and their after-tax income increases by $1k. But their % change to after-tax income is $1k/-$1k = -100%. This is the type of problem that initially motivated this issue.

codykallen commented 6 years ago

@MaxGhenis said:

Consider a tax unit with $1k expanded income and -$1k after-tax income (warrants investigation to determine prevalence), and $0 after-tax income under the reform. This tax unit would not be unallocated, since they have positive expanded income, and their after-tax income increases by $1k. But their % change to after-tax income is $1k/-$1k = -100%. This is the type of problem that initially motivated this issue.

This is indeed an issue. Personally, when I do a distributional analysis, I drop any tax units with negative expanded income and with tax liability in excess of expanded income (i.e. negative after-tax income). Since this is a step further than anything others have done, I did not specifically recommend it, but you could consider it when dealing with your special case concerns.

MaxGhenis commented 6 years ago

when I do a distributional analysis, I drop any tax units with negative expanded income and with tax liability in excess of expanded income (i.e. negative after-tax income).

I imagine you mean "or" where you say "and"? Can we formalize this as one of dropping one of three sets of tax units, i.e. from this comment?

a. Tax units with negative baseline expanded income. b. Tax units with negative baseline after-tax income. c. Union of (a) and (b).

Are there other possibilities? Maybe add AGI too to be comprehensive? (c) seems safest to me, and although it differs from the standard expanded-income-only binning, this group is already weird so that doesn't seem like a big drawback to me.

codykallen commented 6 years ago

@MaxGhenis, you're correct; I meant to say "or" (excluding the union of (a) and (b)). And this group is definitely weird.

Personally, I wouldn't recommend including AGI, since that is complicated by above-the-line deductions and the TCJA's loss limitation. For example, suppose a single filer has $300,000 of wages and $500,000 of business losses. This gives an expanded income of -$200,000, regardless of before or after the TCJA, so the filer would go in the "unallocated" bin. This filer's AGI under pre-TCJA law is -$200,000, but his AGI under TCJA law is $50,000. Expanded income is far more consistent and less susceptible to tax policy than AGI.

MaxGhenis commented 6 years ago

Thanks for the context comparing AGI and expanded income, @codykallen. The important parts to address are metrics that can serve as denominators, so leaving AGI out SGTM.

I added tabulation of expanded income sign by after-tax income sign to this notebook (2018 CPS), which shows that 0.008% of tax units have positive expanded income and negative after-tax income. Including them would then expand the excluded set from 0.110% to 0.0118%, a relative increase of 7%. Although small, this is sizable enough relative to the scope of the problem to be worth doing IMHO. Sounds like it would save Cody some time in future analyses too.

I added this info to this section of the doc.

screenshot 2018-02-22 at 16 41 54

martinholmer commented 6 years ago

Merged pull request #1902 is an attempt to make progress on the issues discussed in #1888. Any remaining issues about the handling in tables and graphs of filing units with negative or zero expanded_income should be raised in a new issue.

MaxGhenis commented 6 years ago

For those not following #1902, it implements a flag to hide nonpositive incomes, including zeros.

I disagree with this design decision. I don't see evidence that those with zero incomes don't truly belong to the bottom decile (as we believe those with negative incomes are actually rich); the share of those with zero income is of a similar order of magnitude as other surveys; and this appears to deviate from other tax analysis groups. The advantage is not having to null out the bottom one or two percentiles that may have undefined % growth, but I don't see this necessitating discarding of these tax units, which are relevant to changes affecting the bottom decile.

That said, the issue has been discussed at length here and in #1902, so I won't be discussing it further. If the maintainers want to consider future changes I'll look out for them, but otherwise I'll be using my own functions to bucket tax units discarding those with negative incomes and keeping those with zero incomes.

martinholmer commented 6 years ago

@MaxGhenis said in #1888:

At this point the maintainers need to make a decision, a process I'm not familiar with when the solution is not obvious. FWIW I've found collaborative documents and meetings more productive than long back-and-forth GitHub/request threads when dealing with complex design challenges that require consensus. I'd be curious what this process typically looks like for Tax-Calculator.

When there is no consensus, Tax-Calculator tries to be agnostic and provide users with the ability to make the decision.

In the case of handling those with negative expanded_income and those with zero expanded_income in the distribution and difference tables (and graphs), there is clearly no consensus. So, the goal of pull requests #1917 and #1918 is to make Tax-Calculator be able to support different user decisions.

@MattHJensen @feenberg @codykallen

MaxGhenis commented 6 years ago

Thanks @martinholmer for supporting more flexibility!

How will the TaxBrain decile tables display zeros and negatives?

martinholmer commented 6 years ago

@MaxGhenis said:

Thanks @martinholmer for supporting more flexibility!

How will the TaxBrain decile tables display zeros and negatives?

I honestly don't know. The TaxBrain developers have the same choices as any other user of Tax-Calculator. If they ask me, I'll suggest they show all three bottom-decile rows (negatives, zeros, positives) in the decile tables.

martinholmer commented 6 years ago

@MaxGhenis asked:

How will the TaxBrain decile tables display zeros and negatives?

You might want to follow PolicyBrain pull request 846.

hdoupe commented 6 years ago

@MaxGhenis asked

How will the TaxBrain decile tables display zeros and negatives?

Yes, please follow OpenSourcePolicyCenter/PolicyBrain#846.

PSLmodels / Tax-Calculator

How should distribution/difference tables handle those with negative income? #1888