HypothesisWorks / hypothesis

Hypothesis is a powerful, flexible, and easy to use library for property-based testing.
https://hypothesis.works
Other
7.59k stars 587 forks source link

Bizarre quirk with how Hypothesis formats the error message for falsifying examples #4183

Open tmaxwell-anthropic opened 6 days ago

tmaxwell-anthropic commented 6 days ago

Consider this test:

counter = 0

@hypothesis.given(dummy=st.text())
def test_that_prints_rich_error(dummy: str):
    global counter
    counter += 1
    if counter % 5 == 0:
        raise ExceptionGroup(
            "outer exceptiongroup", [ZeroDivisionError("this is a fake error")]
        )

It will fail nondeterministically, of course. Hypothesis reports a rich error message describing the failure:

...
  |   File "/opt/homebrew/Caskroom/miniforge/base/envs/py311/lib/python3.11/site-packages/hypothesis/core.py", line 1722, in wrapped_test
  |     raise the_error_hypothesis_found
  | hypothesis.errors.FlakyFailure: Hypothesis test_that_prints_rich_error(dummy='\x02\x1d\U000c4bc0') produces unreliable results: Falsified on the first call but did not on a subsequent one (1 sub-exception)
  | Falsifying example: test_that_prints_rich_error(
  |     dummy='\x02\x1d\U000c4bc0',
  | )
  | Failed to reproduce exception. Expected:
  | + Exception Group Traceback (most recent call last):
  |   |   File "/Users/tmaxwell/code/anthropic/belt/tests/test_hypothesis_utils.py", line 399, in test_that_prints_rich_error
  |   |     raise ExceptionGroup(
  |   | ExceptionGroup: outer exceptiongroup (1 sub-exception)
  |   +-+---------------- 1 ----------------
  |     | ZeroDivisionError: this is a fake error
  |     +------------------------------------
  |
  | Explanation:
  |     These lines were always and only run by failing examples:
  |         /Users/tmaxwell/code/anthropic/belt/tests/test_hypothesis_utils.py:399
  +-+---------------- 1 ----------------
    | Exception Group Traceback (most recent call last):
    ...

Now consider this one:

counter = 0

@hypothesis.given(dummy=st.text())
def test_that_does_not_print_rich_error(dummy: str):
    global counter
    counter += 1
    if counter % 5 == 0:
        try:
            raise ZeroDivisionError("this is a fake error")
        except* Exception as e:
            raise

This also fails nondeterministically, but with a much less helpful message:

...
  |   File "/opt/homebrew/Caskroom/miniforge/base/envs/py311/lib/python3.11/site-packages/hypothesis/core.py", line 1722, in wrapped_test
  |     raise the_error_hypothesis_found
  | ExceptionGroup:  (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/tmaxwell/code/anthropic/belt/tests/test_hypothesis_utils.py", line 389, in test_that_does_not_print_rich_error
    |     raise ZeroDivisionError("this is a fake error")
    | ZeroDivisionError: this is a fake error
    +------------------------------------

What is the difference...? I feel like I'm going insane.

This is with Hypothesis v6.112.1.

tmaxwell-anthropic commented 6 days ago

I tried @settings(phases=(Phase.generate,)) in case it was somehow related to the two tests having different values in the example database, but that didn't change anything. Also, it occurs pretty reliably even if I change other aspects of the test. (I found this by bisecting a much more complicated example.)