HypothesisWorks / hypothesis

Hypothesis is a powerful, flexible, and easy to use library for property-based testing.
https://hypothesis.works
Other
7.52k stars 584 forks source link

Reporting improvements for Scrutineer #3551

Open Zac-HD opened 1 year ago

Zac-HD commented 1 year ago

As part of the explain phase, codenamed Scrutineer, we collect and analyze coverage information. This issue is about upgrading the way we report the information we collect, defined in this python module.

Looking at the following output from a failing Pandas test made me think of a few improvements:

Explanation:
    These lines were always and only run by failing examples:
        C:\hostedtoolcache\windows\Python\3.10.9\x64\lib\site-packages\_pytest\_io\saferepr.py:81
        C:\hostedtoolcache\windows\Python\3.10.9\x64\lib\site-packages\_pytest\assertion\util.py:235
        C:\hostedtoolcache\windows\Python\3.10.9\x64\lib\site-packages\numpy\core\_methods.py:49
        C:\hostedtoolcache\windows\Python\3.10.9\x64\lib\site-packages\pandas\io\formats\format.py:1306
        C:\hostedtoolcache\windows\Python\3.10.9\x64\lib\site-packages\pandas\io\formats\format.py:1312
        (and 18 more with settings.verbosity >= verbose)
Zac-HD commented 1 year ago

An alternative heuristic that might be more helpful: we can check whether there's currently an exception propagating using sys.exc_info(), and keep track of the number of exceptions which have been raised (and in all-but-one cases, caught). Then each time we add a new line to the mapping, note the exception idx (or None); and finally decline to report any lines executed only after the last exception we saw began to propagate.

Or in plain words: ignore lines that only occured after the exception-that-made-the-test-fail was raised.

Might have some pretty annoying overhead, but on the other hand we only need to run this on the minimal failing example, not all the others, so overhead can be low at cost of writing a second trace function.

Zac-HD commented 1 year ago

Another idea: it's easy enough to track the order in which we first saw each unique location. We could then highlight the earliest divergence, while still reporting the others as useful context.