HypothesisWorks / hypothesis

Hypothesis is a powerful, flexible, and easy to use library for property-based testing.
https://hypothesis.works
Other
7.5k stars 582 forks source link

Generating test data without using @given decorator #3790

Closed karlicoss closed 10 months ago

karlicoss commented 10 months ago

What I want to achieve: I'm trying to use hypothesis to generate large amounts of randomized test data -- I'm not trying to use it for tests, just want to use in a script. I found out that I can use .example method from a strategy to achieve data generation. I intentionally simplified my usecase, so let's say we want to generate 1000 integers:

TOTAL = 1000
minint = 0
maxint = 2 ** 31

from hypothesis.strategies import lists, integers
gen = lists(integers(min_value=minint, max_value=maxint), min_size=TOTAL, max_size=TOTAL)
ints = gen.example()
assert len(ints) == TOTAL  # just to check

This works, however I have two issues

So the questions are:

Apologies if it's not the best forum to ask -- I did read the docs and searched through the source code but couldn't really figure this out. Thanks!

Zac-HD commented 10 months ago

@given() is the only way to draw data from strategies - the .example() method just wraps that up for you internally! Supporting meaningfully different interfaces just isn't technically feasible with our limited volunteer time 🙁

For determinism and number of examples, you'll want to use @settings(max_examples=..., derandomize=True).

It's slower than plain random.randint() because we're doing much more under the hood which is useful in testing. If your data is simple that's probably a poor tradeoff; if it's complex then the convenient API probably wins out and the performance gap will be smaller.

Finally, I'll note that Hypothesis' data is draw from a really weird distribution, full of edge cases and weird correlations. That's great for finding bugs, but may or may not be what you want here - if not, I've heard good things about the mimesis library for non-testing usecases (but not used it myself). I hope that helps!

karlicoss commented 10 months ago

Thanks for such a quick response, this helps!

tybug commented 10 months ago

just to answer a concrete question....example() in your case is slow because it is generating and caching 100 examples ahead of time, not just one: https://github.com/HypothesisWorks/hypothesis/blob/226268c9acccc68de89308741151116c9c899256/hypothesis-python/src/hypothesis/strategies/_internal/strategies.py#L327-L340