CatalaLang / catala

Programming language for literate programming law specification
https://catala-lang.org
Apache License 2.0
1.99k stars 77 forks source link

Bug in formalizing US Section 121: Five-year periods #443

Open rohanpadhye opened 1 year ago

rohanpadhye commented 1 year ago

Hi folks,

I'm working on some experimental tooling for testing of Catala programs (more on that later), and I encountered what I think is a bug in the formalization of Section 121 of the US Tax Code. Here's a test that demonstrates the bug:

> Include: examples/us_tax_code/section_121.catala_en 

The point of this test is to show a usage period that continues AFTER the sale date
for a personal property (presumably a seller renting the property they just sold) and
using it to satisfy the requirements of Section 121.

```catala
declaration scope FiveYearBug:
  scope121a scope Section121SinglePerson

scope FiveYearBug:
  definition scope121a.date_of_sale_or_exchange equals |2021-01-01|
  definition scope121a.gain_from_sale_or_exchange_of_property equals $500,000
  definition scope121a.property_ownage equals [Period {
    # Owned for 2 years until sale date; this looks good
    -- begin: |2019-01-01|
    -- end: |2021-01-01|
  }]
  definition scope121a.property_usage_as_principal_residence equals [Period {
    # Used for 1.5 years until sale date; not enough to meet requirements
    -- begin: |2019-06-01|
    -- end: |2021-01-01|
  }; Period {
    # Rented the property for six months AFTER selling it; this should not count!
    -- begin: |2021-01-01|
    -- end: |2021-06-01|
  }]
  definition scope121a.other_section_121a_sale equals NoOtherSaleWhereSection121aApplied

  assertion not scope121a.requirements_met   # Fail! The requirements appear to be met

We should expect the requirements to not be met because there wasn't a total of 2-year usage during the 5-year period ***ending on the date of sale/exchange***:

(a) Exclusion

Gross income shall not include gain from the sale or exchange of property if, during the 5-year period ending on the date of the sale or exchange, such property has been owned and used by the taxpayer as the taxpayer’s principal residence for periods aggregating 2 years or more.


However, the assertion above fails:

$ catala Interpret -s FiveYearBug bug121.catala_en --disable_warnings [ERROR] Assertion failed

┌─⯈ bug121.catala_en:19.12-42: └──┐ 19 │ assertion not scope121a.requirements_met │ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ └┬ Section 121 └┬ (b) Limitations └┬ (5) Exclusion of gain allocated to nonqualified use └─ (D) Coordination with recognition of gain attributable to depreciation



It turns out that the 6 months of usage AFTER the date of sale/exchange is still being counted in the aggregate periods as per the current implementation: https://github.com/CatalaLang/catala/blob/ba44949beab5a7e343a6ecb8a232a8c474e8c538/examples/us_tax_code/section_121.catala_en#L161-L181

There is an assumption here that that the periods must end before the date of sale/exchange, though I don't think that should just be an assertion on the input. If the current formalization was indeed law,  one could imagine a crafty taxpayer who sells a house at a gain and then rents it back for a few months just to meet the requirements for capital gains exclusion. The Catala formalization should address this case and determine that the requirements cannot be satisfied with usage periods that are not within the five-year period that ends on the date of sale/exchange.

As a side note: there *should* probably be assertions here to check that the end of each period is after the beginning of that same period, which is by definition of *periods*.
denismerigoux commented 1 year ago

Hi @rohanpadhye! Thanks for reporting all of this. I'm very excited to know more about the experimental tooling you're working about :)

Concerning the bug you found I'm tagging @AltGr and @slawsk who are currently working on this formalization (https://github.com/CatalaLang/catala/blob/us121/examples/us_tax_code/section_121.catala_en). I agree with you that we should do a bit more of defensive programming and handle well user inputs where periods extend beyond the date of sale or exchange. However currently handling time intervals is a bit clunky in Catala because the language is restricted : for instance we cannot merge two lists of periods of time and aggregate them because that would involve sorting the lists of intervals and you can't implement sort in Catala. So the solution we're heading towards is having an externally-implemented (and trusted) library for periods of time with handy operations in them. But to do that we need https://github.com/CatalaLang/catala/issues/89, and https://github.com/CatalaLang/catala/issues/88, which are scheduled to happen in the next months.

slawsk commented 1 year ago

Agreed that this is a bug--the periods of interest absolutely should end on the date of sale or exchange, the Code is clear. @AltGr and I are working on 121(c), which uses these definitions.

rohanpadhye commented 1 year ago

Thanks, @slawsk, for confirming the bug.

@denismerigoux. Thanks, I'm also excited about the results---the corner case above was discovered automatically using an instrumented version of Catala's Python backend + the Z3 theorem prover. So far it's been a fun side project of mine so I get very little time to work on it, but I aim to share some code and docs of a preliminary version with the community later this week.

The module system seems very interesting and quite complex. Thanks for the pointer, I'll keep an eye out on those issues.

denismerigoux commented 1 year ago

That's neat! Did you know that we already have a Z3 integration in Catala? We generate proof obligations for specific properties and show the results with the Proof CLI command of the Catala compiler. It's the beginning of what we described here: https://hal.inria.fr/hal-03447072/.

rohanpadhye commented 1 year ago

Very nice! I think came across at one point, and I intend to look into the implementation further. I'm trying to do something less ambitious: instead of proving properties directly, just generating test inputs systematically—with the hope that the "properties" can be encoded as assertions. The proof backend is probably the stronger approach in the long run, though the testing approach might be able to get past some cases that haven't yet been (or cannot feasibly be) modeled formally. Let me open a separate PR for this soon and we can discuss there.