HypothesisWorks / hypothesis

Hypothesis is a powerful, flexible, and easy to use library for property-based testing.
https://hypothesis.works
Other
7.39k stars 578 forks source link

example generation regression between `6.47.0` -> `6.103.1` #4014

Open BBBmau opened 2 weeks ago

BBBmau commented 2 weeks ago

Hello, we're interested in using the latest version of hypothesis but noticed a significant increase in example generation between versions:

making an issue since I've seen in the past that some regression occurred here and was the result of some necessary code fixes that sped up performance https://github.com/HypothesisWorks/hypothesis/issues/2308

pinning to 6.47.0:

batman@workstation-shane:~/openpilot/selfdrive/car/tests$ pytest -n8 test_car_interfaces.py
/home/batman/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pytest_benchmark/logger.py:46: PytestBenchmarkWarning: Benchmarks are automatically disabled because xdist plugin is active.Benchmarks cannot be performed reliably in a parallelized environment.
  warner(PytestBenchmarkWarning(text))
Test session starts (platform: linux, Python 3.11.4, pytest 8.2.1, pytest-sugar 1.0.0)
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
Using --randomly-seed=4244670443
rootdir: /home/batman/openpilot
configfile: pyproject.toml
plugins: timeout-2.3.1, xdist-3.6.1, cpp-2.5.0, cov-5.0.0, mock-3.14.0, forked-1.6.0, hypothesis-6.47.0, dash-2.11.1, benchmark-4.0.0, sugar-1.0.0, randomly-3.15.0, subtests-0.12.1, flaky-3.8.1, asyncio-0.23.7, anyio-4.4.0, nbmake-1.5.3, repeat-0.9.3
asyncio: mode=Mode.STRICT
8 workers [209 items]   collecting ... 

 selfdrive/car/tests/test_car_interfaces.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 56% █████▋    
                                            ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓                       100% ██████████
============================================================================ slowest 10 durations =============================================================================
1.16s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_028_FORD_F_150_LIGHTNING_MK1
1.08s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_030_FORD_MAVERICK_MK1
1.06s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_024_FORD_BRONCO_SPORT_MK1
1.03s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_025_FORD_ESCAPE_MK4
1.03s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_031_FORD_MUSTANG_MACH_E_MK1
1.02s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_029_FORD_F_150_MK14
0.99s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_026_FORD_EXPLORER_MK6
0.98s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_074_HYUNDAI_IONIQ_HEV_2022
0.96s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_027_FORD_FOCUS_MK4
0.95s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_151_SKODA_OCTAVIA_MK3

Results (22.66s):
     209 passed

on latest 6.103.1:

batman@workstation-shane:~/openpilot/selfdrive/car/tests$ pytest -n8 test_car_interfaces.py
/home/batman/.pyenv/versions/3.11.4/lib/python3.11/site-packages/pytest_benchmark/logger.py:46: PytestBenchmarkWarning: Benchmarks are automatically disabled because xdist plugin is active.Benchmarks cannot be performed reliably in a parallelized environment.
  warner(PytestBenchmarkWarning(text))
Test session starts (platform: linux, Python 3.11.4, pytest 8.2.1, pytest-sugar 1.0.0)
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
Using --randomly-seed=264861645
rootdir: /home/batman/openpilot
configfile: pyproject.toml
plugins: timeout-2.3.1, xdist-3.6.1, cpp-2.5.0, cov-5.0.0, mock-3.14.0, forked-1.6.0, dash-2.11.1, benchmark-4.0.0, hypothesis-6.103.1, sugar-1.0.0, randomly-3.15.0, subtests-0.12.1, flaky-3.8.1, asyncio-0.23.7, anyio-4.4.0, nbmake-1.5.3, repeat-0.9.3
asyncio: mode=Mode.STRICT
8 workers [209 items]   collecting ... 

 selfdrive/car/tests/test_car_interfaces.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 56% █████▋    
                                            ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓                       100% ██████████
============================================================================ slowest 10 durations =============================================================================
1.70s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_029_FORD_F_150_MK14
1.65s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_027_FORD_FOCUS_MK4
1.62s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_025_FORD_ESCAPE_MK4
1.61s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_026_FORD_EXPLORER_MK6
1.56s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_028_FORD_F_150_LIGHTNING_MK1
1.56s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_108_KIA_NIRO_PHEV
1.51s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_030_FORD_MAVERICK_MK1
1.51s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_151_SKODA_OCTAVIA_MK3
1.50s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_150_SKODA_KODIAQ_MK1
1.49s call     selfdrive/car/tests/test_car_interfaces.py::TestCarInterfaces::test_car_interfaces_149_SKODA_KAROQ_MK1

Results (36.67s):
     209 passed
tybug commented 2 weeks ago

Thanks for the report! It would be great if you could bisect the versions here and narrow it down to a smaller version range (ideally, a single version). We've made quite a few core engine changes starting at around 6.89.1 that could have caused this.

Looking through #2308 I see this comment about generate_new_examples https://github.com/HypothesisWorks/hypothesis/issues/2308#issuecomment-584111017, which may turn out to be prescient as we just touched that in #4007.