FailedHealthCheck with hypothesis.extra.django for django.contrib.auth.models.User model

MrGreenTea commented 6 years ago

I just use create the most simple strategy like this: USER_STRATEGY = models.models(User)

Then I get a FailedHealthCheck: hypothesis.errors.FailedHealthCheck: It looks like your strategy is filtering out a lot of data. Health check found 50 filtered examples but only 3 good ones. This will make your tests much slower ...

Any idea what I can do to get User objects for my tests?

Zac-HD commented 6 years ago

I would guess you have some complex validation on one or more fields, and the default strategy is giving you data of the right type (eg DateField) that fails validation (eg 'only the first of the month').

You can override the strategy for each field of a model, with the example of customer age here - but I'll admit the docs don't make it obvious! (We'll therefore keep this issue open to improve them.) For example:

USER_STRATEGY = models.models(User, age=st.integers(0, 100))

Does that help?

MrGreenTea commented 6 years ago

Yeah I suspected that at first too, because we have a lot of ForeignKeys to User. But even in a fresh project the same happens. My whole test code is this:

from django.contrib.auth.models import User
from django.test import TestCase
from hypothesis import given
from hypothesis.extra.django import models as model_st

USER_STRATEGY = model_st.models(User)

class TestUser(TestCase):
    @given(USER_STRATEGY)
    def test_user(self, user):
        assert user.username

My django version is Django 2.0.2 with Hypothesis 3.45.0

DRMacIver commented 6 years ago

You're using django.test.TestCase but you need to be using hypothesis.extra.django.TestCase. What's happening is that generated users are colliding with previous generated users because the database state isn't being reset at the start of each generation (though I'm a little surprised that this is how that manifests).

We should probably be better about signalling this problem.

Zac-HD commented 6 years ago

Ooohh, yep, that would do it. Related to #1071 and the (hopefully someday soon) solution in #1072 - in both cases we need to check the type of the class that our test method is defined on.

DRMacIver commented 6 years ago

I'm going to close this as it's not actually a bug in Hypothesis, but I've open #1113 for communicating this problem more clearly.

MrGreenTea commented 6 years ago

That doesn't change it for me.

# Create your tests here.
import hypothesis.extra.django
from django.contrib.auth.models import User
from hypothesis import given
from hypothesis.extra.django import models as model_st

USER_STRATEGY = model_st.models(User)

class TestUser(hypothesis.extra.django.TestCase):
    @given(USER_STRATEGY)
    def test_user(self, user):
        assert user.username

still gives me

hypothesis.errors.FailedHealthCheck: It looks like your strategy is filtering out a lot of data. Health check found 50 filtered examples but only 5 good ones

DRMacIver commented 6 years ago

Interesting. You're absolutely right. I could have sworn we had tests that this worked with the standard auth User, but apparently we don't! Sorry for jumping to conclusions.

MrGreenTea commented 6 years ago

Yeah, regressions happen. No worries ;) Hope you manage to find a solution and thanks for all the help!

DRMacIver commented 6 years ago

Based on some investigation, I'm pretty sure the problem is that the user model has a very restrictive idea of what characters a username can contain, and we're trying to generate arbitrary unicode text for it.

We should figure out a proper fix, but the following will let you work around it until we do:


import hypothesis.strategies as st
import string

USER_STRATEGY = model_st.models(
    User,
    username=st.text(
        alphabet=string.ascii_letters + string.digits + '@/./+/-/_'))

The-Compiler commented 6 years ago

nitpick: @DRMacIver, I don't think you meant to include the slashes in the character list there :wink:

Zac-HD commented 6 years ago

(I know for a fact there are people using Hypothesis for Django, but if many of them are using its model introspection then I'm frankly astonished that we've never seen this before)

I thought this might be new in Django 2.0, but no - supported since 1.10 😞

nitpick: @DRMacIver, I don't think you meant to include the slashes in the character list there 😉

That's straight outta the Django docs, but you're right that they shouldn't be in the string literal. Instead, why not use the regex that we're failing to validate against? username=st.from_regex(r'^[\w.@+-]+$')

Which, uh, probably is the proper fix: for all string fields that use a RegexValidator, we should use from_regex instead of text. Note that we will need to be careful about inverted matches, flags, and compiled expressions vs string patterns.

DRMacIver commented 6 years ago

nitpick: @DRMacIver, I don't think you meant to include the slashes in the character list there 😉

Uh, yeah, I just copied those out of the validator error message without really thinking about it.

DRMacIver commented 6 years ago

I thought this might be new in Django 2.0, but no - supported since 1.10 😞

I mean I do know that there are people using this stuff! I know there are people using Hypothesis + Django reasonably heavily, but I think the number of people using model introspection must be way lower than I thought if this is the first time we've seen this.

Zac-HD commented 6 years ago

Notes dump:

Django has a bunch of default validators, which have been pretty stable for a long time. We can therefore make the field strategies substantially more powerful by introspecting validators as well as the field type.
For some (eg RegexValidator, EmailValidator), we'll choose a different base strategy
For most others, we can choose better arguments to the strategy based on the validator - eg MinValueValidator, MaxLengthValidator, etc.
For custom (and some builtin) validators, filtering will remain the only way. We should keep it in all cases as a final check for correctness.
We should resolve the strategy type and default kwargs for the given field type, then resolve kwargs from the validators (min_value, max_value, min_length, etc), possibly switch strategy type, and reconcile the arguments to give the actual strategy.
We should do some sanity-checks here too, eg for empty size ranges or multiple RegexValidators. It's technically possible to compute the intersection of n regular expressions by analysis as finite automata, but that doesn't admit backreferences (ie PCRE) and is IMO out of scope for this. Simpler lookahead-based combination of the patterns is equivalent to just filtering from a Hypothesis generation perspective, so that doesn't help.
Documentation should be clearer on how to override the strategy for a particular field, but also suggest careful use of validators once they're supported.

This is a substantial new feature, but it would be really really nice to be able to generate instances of the default user model (!!), and the principle unlocks a massive improvement in specificity of inference and I assume real-world usefulness. If anyone else is using it in the real world 😕

Zac-HD commented 6 years ago

I suggest we put together a minimal fix - just improving the docs and handling RegexValidator for CharField and TextField - and merge that, leaving a new issue (linked from said docs) with more detailed notes on the larger body of work for external or funded contributions.

DRMacIver commented 6 years ago

I suggest we put together a minimal fix - just improving the docs and handling RegexValidator for CharField and TextField - and merge that, leaving a new issue (linked from said docs) with more detailed notes on the larger body of work for external or funded contributions.

👍 with the possible caveat that it would be really nice to support the default User object and I bet we'll find some other problems when we try to do that. But I agree that just supporting the minimal amount to make that work is 100% the right starting point.

HypothesisWorks / hypothesis

FailedHealthCheck with hypothesis.extra.django for django.contrib.auth.models.User model #1112