PSLmodels / Tax-Calculator

USA Federal Individual Income and Payroll Tax Microsimulation Model
https://taxcalc.pslmodels.org
Other
255 stars 154 forks source link

Seeking advice: Identifying tax filer records in a given year #2501

Closed donboyd5 closed 3 years ago

donboyd5 commented 3 years ago

I want to advance puf.csv to a particular tax year (generally 2017), summarize key variables, and compare them to published IRS totals for corresponding variables.

Many IRS tables provide summary data for tax filers (e.g., this one) but puf.csv has nonfilers as well as filers, so I want to remove the nonfilers so that the comparison is appropriate. I am looking for the best way to do that.

One possibility is to only include records where data_source ==1. Per the docs, it is:

Description: 1 if unit is created primarily from IRS-SOI PUF data; 0 if created primarily from CPS data (not used in tax-calculation logic).

This may be a good first cut. However, data_source does not change from year to year, whereas people who file will face different thresholds depending on their incomes and the tax law.

Can anyone recommend a better way to identify filer and non-filer records?

Many thanks.

MattHJensen commented 3 years ago

This is HR Block's description of filing requirements.

I agree with you that data_source==1 is a first-cut but that we can likely do better.

One possibility is non-itemizers with strictly zero tax liabilities. So c04470 and iitax are both zero.

I would be interested to hear what @andersonfrailey and @feenberg think about this.

andersonfrailey commented 3 years ago

I would recommend using something like the filing requirements Matt linked to and creating a flag for records that met them. As you said data_source isn't ever updated so using that as a proxy for filing status could lead to some strange results. Best bet in my opinion is to advance the file to the year you're interested in and then create a function that identifies the units meeting the filing requirements.

donboyd5 commented 3 years ago

Thank you @MattHJensen and @andersonfrailey. I will start out with the " c04470 and iitax are both zero" approach and work toward the filing rules approach as time permits.

donboyd5 commented 3 years ago

@MattHJensen, @andersonfrailey

Seeking advice on how best to define gross income for purposes of a filing requirement test. Please see below

First attempt I tried the easy approach first, defining nonfilers as records where c04470 and iitax are both zero. Unfortunately, that gave me 12.4% fewer filers in 2017 than the IRS reports in Table 1.1. All Returns: Selected Income and Tax Items, by Size and Accumulated Size of Adjusted Gross Income (17in11si.xls).

Second attempt Next, I defined 2017 filers using a 2-pronged set of rules:

1) Required filers: I based this on the 2017 marital-status/age/gross-income filer requirements found in Table 1-1 of this IRS document using a simple definition of gross income.

I defined gross income as agi (c00100) plus above-the-line-adjustments (c02900).

2) Non-required records that are likely to file anyway: Records that had any of: negative agi, nonzero iitax, or any nonzero credit.

To my surprise, and unfortunately, that resulted in 10.4% fewer 2017 filers than the IRS reports.

Best attempt so far The problem appears (to me) to be in the definition of gross income. This time I defined gross income as:

This results in 3.7% fewer filers than the IRS reports, which seems within the realm of reason.

However, I do not understand taxdata and tax-calculator well enough to know whether I used the best definition of gross income possible. Obviously I would like to match the definition the IRS uses in the doc linked above. I am not sure if I defined losses to add back properly, or whether I captured all of them, and I am not sure whether I defined untaxed income to add back properly or whether I captured all of it.

I would much appreciate advice on how to improve any of this.

I copy below an excerpt from the IRS doc and then my full relevant code. Many thanks.

Gross income. This includes all income you receive in the form of money, goods, property, and services that isn't exempt from tax. It also includes income from sources outside the United States or from the sale of your main home (even if you can exclude all or part of it). Include part of your social security benefits if: 1. You were married, filing a separate return, and you lived with your spouse at any time during 2017; or 2. Half of your social security benefits plus your other gross income and any tax-exempt interest is more than $25,000 ($32,000 if married filing jointly).

I probably created too-expansive a definition of gross income because I included some tax-exempt income. I probably need to look at the IRS definition more closely. Of course, if I narrow the definition, I will end up with fewer filers, falling further short of what the IRS reports, which will make retargeting the 2017 puf a bit concerning.

# define filers
# https://www.irs.gov/pub/irs-prior/p17--2017.pdf
# define gross income as above the line income plus any losses deducted in
# arriving at that, plus any income excluded in arriving at that
above_line_income = puf.c00100 + puf.c02900

# add back any losses that were used to reduce above the line income
# these are negative so we will subtract them from above the line income
capital_losses = puf.c23650.lt(0) * puf.c23650 + puf.c01000.lt(0) * puf.c01000
other_losses = puf.e01200.lt(0) * puf.e01200
business_losses = puf.e00900.lt(0) * puf.e00900
rent_losses = puf.e02000.lt(0) * puf.e02000
farm_losses = puf.e02100.lt(0) * puf.e02100
above_line_losses = capital_losses + other_losses + business_losses + rent_losses + farm_losses

# addback any untaxed income that was excluded in calculating above the line income
interest_untaxed = puf.e00400
# dividends_untaxed ?? not sure what to do
pensions_untaxed = puf.e01500 - puf.e01700  # always ge zero, I checked
socsec_untaxed = puf.e02400 - puf.c02500  # OVERSTATEMENT always ge zero, I checked
above_line_untaxed = interest_untaxed + pensions_untaxed + socsec_untaxed

gross_income = above_line_income - above_line_losses + above_line_untaxed

# to be on the safe side, don't let gross_income be negative
puf['gross_income'] = gross_income * gross_income.ge(0)

# define filer masks
# households that are required to file based on marital status, age, and gross income
m_single_lt65 = puf.MARS.eq(1) & puf.age_head.lt(65) & puf.gross_income.ge(10400)
m_single_ge65 = puf.MARS.eq(1) & puf.age_head.ge(65) & puf.gross_income.ge(11950)
m_single = m_single_lt65 | m_single_ge65

# married joint
m_mfj_bothlt65 = puf.MARS.eq(2) & puf.age_head.lt(65) & puf.age_spouse.lt(65) & puf.gross_income.ge(20800)
m_mfj_onege65 = puf.MARS.eq(2) & (puf.age_head.ge(65) | puf.age_spouse.ge(65)) & puf.gross_income.ge(22050)
m_mfj_bothge65 = puf.MARS.eq(2) & puf.age_head.ge(65) & puf.age_spouse.ge(65) & puf.gross_income.ge(23300)
m_mfj = m_mfj_bothlt65 | m_mfj_onege65 | m_mfj_bothge65

# married separate
m_mfs = puf.MARS.eq(3) & puf.gross_income.ge(4050)

# head of household
m_hoh_lt65 = puf.MARS.eq(4) & puf.age_head.lt(65) & puf.gross_income.ge(13400)
m_hoh_ge65 = puf.MARS.eq(4) & puf.age_head.ge(65) & puf.gross_income.ge(14950)
m_hoh = m_hoh_lt65 | m_hoh_ge65

# qualifying widow(er)
m_qw_lt65 = puf.MARS.eq(5) & puf.age_head.lt(65) & puf.gross_income.ge(16750)
m_qw_ge65 = puf.MARS.eq(5) & puf.age_head.ge(65) & puf.gross_income.ge(18000)
m_qw = m_qw_lt65 | m_qw_ge65

m_required = m_single | m_mfj | m_mfs | m_hoh | m_qw

# returns that surely will or must file even if marital-status/age/gross_income requirement not met
m_negagi = puf.c00100.lt(0) # negative agi
m_iitax = puf.iitax.ne(0)
m_credits = puf.c07100.ne(0) | puf.refund.ne(0)

m_not_required = m_negagi | m_iitax | m_credits

m_filer = m_required | m_not_required

puf['filer'] = m_filer
donboyd5 commented 3 years ago

Although the definitions above were within 3.7% of IRS reported values they led to 10.4% too few returns with positive wages.

It seems reasonable (to me) to think that most people with more than de minimis wages will file returns, as they were subject to withholding and either have a refund to claim or more money to pay. Thus, I have added one more category of likely filers even if they do not meet the filing requirements: records with at least $1,000 of wages.

I also have added ordinary dividends minus qualified dividends to untaxed income in the computation of gross income for filing-requirements purposes.

With these changes, we have 1.2% too many filers (vs the IRS) and 4.6% too few filers who have nonzero wages.

That seems pretty good to me, and for now I will use it as my filer definition. However, I would like to improve the definition of gross income and would much appreciate ideas about how to improve it to be more consistent with the IRS definition.

For now I will use this new filer definition as my way to define the puf 2017 universe to compare to the IRS 2017 statistics, and to define the puf records that will be reweighted in an effort to come closer to the IRS values.

For completeness, I copy my current code for defining 2017 filers below.

def filers(puf, year=2017):
    """Return boolean array identifying tax filers.

    Parameters
    ----------
    puf : TYPE
        DESCRIPTION.
    year : TYPE
        DESCRIPTION.

    Returns
    -------
    None.

    # IRS rules for filers: https://www.irs.gov/pub/irs-prior/p17--2017.pdf

    Gross income. This includes all income you receive in the form of money,
    goods, property, and services that isn't exempt from tax. It also includes
    income from sources outside the United States or from the sale of your main
    home (even if you can exclude all or part of it). Include part of your
    social security benefits if: 1. You were married, filing a separate return,
    and you lived with your spouse at any time during 2017; or 2. Half of your
    social security benefits plus your other gross income and any tax-exempt
    interest is more than $25,000 ($32,000 if married filing jointly).

    define gross income as above the line income plus any losses deducted in
    arriving at that, plus any income excluded in arriving at that
    """
    if year == 2017:
        s_inc_lt65 = 10400
        s_inc_ge65 = 11950

        mfj_inc_bothlt65 = 20800
        mfj_inc_onege65 = 22050
        mfj_inc_bothge65 = 23300

        mfs_inc = 4050

        hoh_inc_lt65 = 13400
        hoh_inc_ge65 = 14950

        qw_inc_lt65 = 16750
        qw_inc_ge65 = 18000

        wage_threshold = 1000

    # above the line income is agi plus above line adjustments getting to agi
    above_line_income = puf.c00100 + puf.c02900

    # add back any losses that were used to reduce above the line income
    # these are negative so we will subtract them from above the line income
    capital_losses = puf.c23650.lt(0) * puf.c23650 \
        + puf.c01000.lt(0) * puf.c01000
    other_losses = puf.e01200.lt(0) * puf.e01200
    business_losses = puf.e00900.lt(0) * puf.e00900
    rent_losses = puf.e02000.lt(0) * puf.e02000
    farm_losses = puf.e02100.lt(0) * puf.e02100
    above_line_losses = capital_losses + other_losses + business_losses \
        + rent_losses + farm_losses

    # add back any untaxed income that was excluded in calculating
    # above the line income
    interest_untaxed = puf.e00400
    dividends_untaxed = puf.e00600 - puf.e00650
    pensions_untaxed = puf.e01500 - puf.e01700  # always ge zero, I checked
    # socsec_untaxed is OVERSTATED - I think IRS has a limit on amount
    socsec_untaxed = puf.e02400 - puf.c02500  # always ge zero, I checked
    above_line_untaxed = interest_untaxed + dividends_untaxed \
        + pensions_untaxed + socsec_untaxed

    gross_income = above_line_income - above_line_losses + above_line_untaxed

    # to be on the safe side, don't let gross_income be negative
    gross_income = gross_income * gross_income.ge(0)

    # define filer masks
    # households that are required to file based on marital status,
    # age, and gross income

    # single
    m_single_lt65 = puf.MARS.eq(1) \
        & puf.age_head.lt(65) \
        & gross_income.ge(s_inc_lt65)

    m_single_ge65 = puf.MARS.eq(1) \
        & puf.age_head.ge(65) \
        & gross_income.ge(s_inc_ge65)

    m_single = m_single_lt65 | m_single_ge65

    # married joint
    m_mfj_bothlt65 = puf.MARS.eq(2) \
        & puf.age_head.lt(65) \
        & puf.age_spouse.lt(65) \
        & gross_income.ge(mfj_inc_bothlt65)

    m_mfj_onege65 = puf.MARS.eq(2) \
        & (puf.age_head.ge(65) | puf.age_spouse.ge(65)) \
        & ~(puf.age_head.ge(65) & puf.age_spouse.ge(65)) \
        & gross_income.ge(mfj_inc_onege65)

    m_mfj_bothge65 = puf.MARS.eq(2) \
        & puf.age_head.ge(65) \
        & puf.age_spouse.ge(65) \
        & gross_income.ge(mfj_inc_bothge65)

    m_mfj = m_mfj_bothlt65 | m_mfj_onege65 | m_mfj_bothge65

    # married separate
    m_mfs = puf.MARS.eq(3) & gross_income.ge(mfs_inc)

    # head of household
    m_hoh_lt65 = puf.MARS.eq(4) \
        & puf.age_head.lt(65) \
        & gross_income.ge(hoh_inc_lt65)

    m_hoh_ge65 = puf.MARS.eq(4) \
        & puf.age_head.ge(65) \
        & gross_income.ge(hoh_inc_ge65)

    m_hoh = m_hoh_lt65 | m_hoh_ge65

    # qualifying widow(er)
    m_qw_lt65 = puf.MARS.eq(5) \
        & puf.age_head.lt(65) \
        & gross_income.ge(qw_inc_lt65)

    m_qw_ge65 = puf.MARS.eq(5) \
        & puf.age_head.ge(65) \
        & gross_income.ge(qw_inc_ge65)

    m_qw = m_qw_lt65 | m_qw_ge65

    m_required = m_single | m_mfj | m_mfs | m_hoh | m_qw

    # returns that surely will or must file even if
    # marital-status/age/gross_income requirement is not met
    m_negagi = puf.c00100.lt(0)  # negative agi
    m_iitax = puf.iitax.ne(0)
    m_credits = puf.c07100.ne(0) | puf.refund.ne(0)
    m_wages = puf.e00200.ge(wage_threshold)

    m_likely = m_negagi | m_iitax | m_credits | m_wages

    m_filer = m_required | m_likely

    return m_filer

# put filer boolean indicator on puf
puf['filer'] = filers(puf, year=2017)
MattHJensen commented 3 years ago

@donboyd5, this is really helpful work, thank you!

A few quick reactions, and I hope to think more about this later today / tomorrow.

  1. I'm not convinced that it is helpful -- for finding the right definition of filers -- to compare the number of filers under each definition to SOI totals in 2017. That could be a helpful check if using 2011 data and 2011 filer criteria, whereupon you could see how close the filer screen comes to matching the puf.csv contents, but I don't entirely follow why it is useful using the extrapolated data. I may very well just be missing some of your thinking here.

  2. There's a slightly longer definition (than the one you copied above) available under table 1-1. I suspect you saw it, and I don't think it adds much other than some context about losses, but I am including here for reference.

Gross income means all income you received in the form of money, goods, property, and services that isn't exempt from tax, including any income from sources outside the United States or from the sale of your main home (even if you can exclude part or all of it). Don't include any social security benefits unless (a) you are married filing a separate return and you lived with your spouse at any time during 2017 or (b) one-half of your social security benefits plus your other gross income and any tax-exempt interest is more than $25,000 ($32,000 if married filing jointly). If (a) or (b) applies, see the instructions for Form 1040 or 1040A or Pub. 915 to figure the taxable part of social security benefits you must include in gross income. Gross income includes gains, but not losses, reported on Form 8949 or Schedule D. Gross income from a business means, for example, the amount on Schedule C, line 7, or Schedule F, line 9. But, in figuring gross income, don't reduce your income by any losses, including any loss on Schedule C, line 7, or Schedule F, line 9.

  1. The bolded part seems important to me: "Gross income means all income you received in the form of money, goods, property, and services that isn't exempt from tax". So I am interested in your thinking about the section in your code that "add[s] back any untaxed income that was excluded in calculating above the line income."
donboyd5 commented 3 years ago

Thanks for the helpful feedback, @MattHJensen.

Re your first comment, I'm not really using the IRS numbers of filers to decide how to define filers in the PUF. Rather, I was using it to decide whether something is amiss in the file as I looked at it. When I saw that I had far too few filers with wages, I asked myself why, since clearly something was amiss. That set off a light bulb that said, "ah, even people who are not required to file (because they are below the requirement thresholds) might choose to file if, for example, they had wages, which meant they probably had withholding, which meant that if they are below the filing thresholds they probably chose to file anyway so that they can get the refund they are properly owed". The IRS suggests as much in its "Who Should File" part of the document:

image

More generally, that's why I have the following line in the function above:

m_likely = m_negagi | m_iitax | m_credits | m_wages

I'm trying to capture people who are likely filers even if they are not required to file. It includes people with negative AGI (I assume they're just different and not like low income people who don't have to file), people for whom tax-calculator calculates tax, people who have credits they can claim as determined by tax-calculator, and people who had wages above a de minimis level who I assumed probably need to file a return to get a refund.

I included that set of rules because they made sense (to me) in their own right, not because they gave results that seem reasonable (# of wage earning filers being reasonable). However, if the numbers still had been way off, I probably would still be searching for a better definition - based on something else I might have missed in the rules and incentives.

Re your second comment, you caught me making a mistake out of laziness. At first, I thought the definition of gross income (1st screenshot below) meant line 22 above-the-line income (2nd screenshot below). But then, I though they mean more because they clearly intend some gains on home sales and some schedule C income. I meant to read the detailed definitions in the back of the IRS document but didn't get to it, rummaged around the ymod variables but didn't really understand what they captured, and was hoping someone would tell me if a variable from tax-calculator is close to gross income as defined for filing purposes. In the interim, I left the too-large definition in place.

I'll redo it later today with just above the line income - do you think that seems like the best proxy for what they call "gross income", @MattHJensen?

image

image

donboyd5 commented 3 years ago

@MattHJensen, I reran everything defining gross income as above the line income (line 22) and the results look plausible to me, and of course no longer have a clearly incorrect definition of gross income. Thanks for that.

If you have any suggested improvements to the gross income definition that would be great but for the time being I'll use above the line income.

MattHJensen commented 3 years ago

I included that set of rules because they made sense (to me) in their own right, not because they gave results that seem reasonable (# of wage earning filers being reasonable). However, if the numbers still had been way off, I probably would still be searching for a better definition - based on something else I might have missed in the rules and incentives.

This makes sense, thanks for the explanation.

At first, I thought the definition of gross income (1st screenshot below) meant line 22 above-the-line income (2nd screenshot below). But then, I though they mean more because they clearly intend some gains on home sales and some schedule C income. I meant to read the detailed definitions in the back of the IRS document but didn't get to it, rummaged around the ymod variables but didn't really understand what they captured, and was hoping someone would tell me if a variable from tax-calculator is close to gross income as defined for filing purposes. In the interim, I left the too-large definition in place.

I reran everything defining gross income as above the line income (line 22) and the results look plausible to me, and of course no longer have a clearly incorrect definition of gross income.

In the code snippet from this comment, there are three blocks building up gross income:

  1. Above the line income
  2. Add back losses
  3. Add untaxed income.

I think your most recent changes dropped the 2nd and 3rd, keeping only the 1st. My (still somewhat hazy) take is that only the 3rd should be dropped, but the 1st and 2nd should be kept. In other words, "gross income" sounds to me like above the line income gross of losses.

donboyd5 commented 3 years ago

That is what I did, @MattHJensen. I think you are right. I have done a lot of other things since then (am about to post on that) but will circle back and fix this and rerun, although it may be awhile - my intuition is that most people who have losses probably were caught in one of the filing-requirements or likely-filing categories so I don't think it will make a lot of difference but I'll find out as soon as practical.

donboyd5 commented 3 years ago

Here is an updated filers identification function, in response to the mention by @MattHJensen in #2510. The changes are:

I'll plan to update reweighting, etc. with a new filers definition (this one or an improved version) early next week. I welcome any comments to make it better before then (or after, too, but that would be less immediately useful).

def filers(puf, year=2017):
    """Return boolean array identifying tax filers.

    Parameters
    ----------
    puf : TYPE
        DESCRIPTION.
    year : TYPE
        DESCRIPTION.

    Returns
    -------
    None.

    # IRS rules for filers: https://www.irs.gov/pub/irs-prior/p17--2017.pdf

    Gross income. This includes all income you receive in the form of money,
    goods, property, and services that isn't exempt from tax. It also includes
    income from sources outside the United States or from the sale of your main
    home (even if you can exclude all or part of it). Include part of your
    social security benefits if: 1. You were married, filing a separate return,
    and you lived with your spouse at any time during 2017; or 2. Half of your
    social security benefits plus your other gross income and any tax-exempt
    interest is more than $25,000 ($32,000 if married filing jointly).

    define gross income as above the line income plus any losses deducted in
    arriving at that, plus any income excluded in arriving at that
    """
    if year == 2017:
        s_inc_lt65 = 10400
        s_inc_ge65 = 11950

        mfj_inc_bothlt65 = 20800
        mfj_inc_onege65 = 22050
        mfj_inc_bothge65 = 23300

        mfs_inc = 4050

        hoh_inc_lt65 = 13400
        hoh_inc_ge65 = 14950

        qw_inc_lt65 = 16750
        qw_inc_ge65 = 18000

        wage_threshold = 1000

    # above the line income is agi plus above line adjustments getting to agi
    above_line_income = puf.c00100 + puf.c02900

    # add back any losses that were used to reduce above the line income
    # these are negative so we will subtract them from above the line income
    capital_losses = puf.c23650.lt(0) * puf.c23650 \
        + puf.c01000.lt(0) * puf.c01000
    other_losses = puf.e01200.lt(0) * puf.e01200
    business_losses = puf.e00900.lt(0) * puf.e00900
    rent_losses = puf.e02000.lt(0) * puf.e02000
    farm_losses = puf.e02100.lt(0) * puf.e02100
    above_line_losses = capital_losses + other_losses + business_losses \
        + rent_losses + farm_losses

    # add back any untaxed income that was excluded in calculating
    # above the line income and that is not considered "exempt"
    # It is clear that IRS includes some Social Security in some
    # circumstances, but for now I treat it as wholly exempt

    # here is the full portion of untaxed Social Security. I think this
    # is OVERSTATED - I think IRS has a limit on amount that can be added
    # back but am not sure how to calculate it
    # socsec_untaxed = puf.e02400 - puf.c02500  # always ge zero, I checked
    socsec_untaxed = 0
    above_line_untaxed = socsec_untaxed

    # gross_income -- is anything left out?
    gross_income = above_line_income - above_line_losses + above_line_untaxed

    # to be on the safe side, don't let gross_income be negative
    gross_income = gross_income * gross_income.ge(0)

    # define filer masks; the approach is to define two groups of households:
    #   (1) households that are required to file based on marital status,
    #       age, and gross income
    #   (2) households that are likely to file whether they are required to or
    #       not, because they are likely to need a refund (e.g., people with)
    #       wage income, or are seeking a credit, or have a complex return
    #       (they have negative AGI)

    # single
    m_single_lt65 = puf.MARS.eq(1) \
        & puf.age_head.lt(65) \
        & gross_income.ge(s_inc_lt65)

    m_single_ge65 = puf.MARS.eq(1) \
        & puf.age_head.ge(65) \
        & gross_income.ge(s_inc_ge65)

    m_single = m_single_lt65 | m_single_ge65

    # married joint
    m_mfj_bothlt65 = puf.MARS.eq(2) \
        & puf.age_head.lt(65) \
        & puf.age_spouse.lt(65) \
        & gross_income.ge(mfj_inc_bothlt65)

    m_mfj_onege65 = puf.MARS.eq(2) \
        & (puf.age_head.ge(65) | puf.age_spouse.ge(65)) \
        & ~(puf.age_head.ge(65) & puf.age_spouse.ge(65)) \
        & gross_income.ge(mfj_inc_onege65)

    m_mfj_bothge65 = puf.MARS.eq(2) \
        & puf.age_head.ge(65) \
        & puf.age_spouse.ge(65) \
        & gross_income.ge(mfj_inc_bothge65)

    m_mfj = m_mfj_bothlt65 | m_mfj_onege65 | m_mfj_bothge65

    # married separate
    m_mfs = puf.MARS.eq(3) & gross_income.ge(mfs_inc)

    # head of household
    m_hoh_lt65 = puf.MARS.eq(4) \
        & puf.age_head.lt(65) \
        & gross_income.ge(hoh_inc_lt65)

    m_hoh_ge65 = puf.MARS.eq(4) \
        & puf.age_head.ge(65) \
        & gross_income.ge(hoh_inc_ge65)

    m_hoh = m_hoh_lt65 | m_hoh_ge65

    # qualifying widow(er)
    m_qw_lt65 = puf.MARS.eq(5) \
        & puf.age_head.lt(65) \
        & gross_income.ge(qw_inc_lt65)

    m_qw_ge65 = puf.MARS.eq(5) \
        & puf.age_head.ge(65) \
        & gross_income.ge(qw_inc_ge65)

    m_qw = m_qw_lt65 | m_qw_ge65

    m_required = m_single | m_mfj | m_mfs | m_hoh | m_qw

    # returns that surely will or must file even if
    # marital-status/age/gross_income requirement is not met
    m_negagi = puf.c00100.lt(0)  # negative agi
    m_iitax = puf.iitax.ne(0)
    m_credits = puf.c07100.ne(0) | puf.refund.ne(0)
    m_wages = puf.e00200.ge(wage_threshold)

    m_likely = m_negagi | m_iitax | m_credits | m_wages

    m_filer = m_required | m_likely

    return m_filer
MattHJensen commented 3 years ago

@donboyd5, this new approach looks good to me for the current purposes. I spent some time reading about social security benefits and am still fuzzy, so I'm no help filling in that placeholder at the moment.