Django Error when using multiple fixtures and factory.Iterator

jrobichaud commented 7 years ago

We have several models with ForeignKey.

These models have factories using factory.Iterator on the same tables (Ex: Countries.objects.all(), Currencies.objects.all(), etc.) Ex:

CountryID = factory.Iterator(models.Countries.objects.all())
CurrencyID = factory.Iterator(models.Currencies.objects.all())

We have multiple django test cases using different fixtures with different data for these tables (Ex: A fixture with the country "Afghanistan", another fixture with "Canada" and "USA").

With SQLite it gives this kind of error:

<AppName>.models.DoesNotExist: Countries matching query does not exist.

With MySQL it gives this kind of error: django.db.utils.IntegrityError: (1452, 'Cannot add or update a child row: a foreign key constraint fails (`test_<AppName>`.`SomeModel`, CONSTRAINT `SomeModel_CurrencyID_2734797b_fk_Currencies_id` FOREIGN KEY (`CurrencyID`) REFERENCES `Currencies` (`id`))')

Tested with factory-boy 2.7.0 and 2.8.1 with django 1.9.

When many tests are ran the bug can occur. When running the tests individually they succeed.

I assume there is a problem with the iterator when unloading and loading fixtures on these table. The iterator may not be reinitialized between tests cases.

Workarounds:

Use the same data for these tables (either same fixture or same data in different fixtures)
Replace the iterator with a lamda that will randomly select a row in the table
Replace the iterator with a subfactory (but does not work well with some models)

kevin-brown commented 7 years ago

With SQLite it gives this kind of error:

I'm not sure where the Countries model is coming from here. Are you sure that this error is happening because of the iterator?

Edit: It looks like the CountryID field was added, now that makes more sense.

This looks like it's probably related to the queryset/iterator caching requests, thinking something is there, and then getting an error because there is actually nothing in the table.

With MySQL it gives this kind of error:

This sounds like the iterator is caching the results, or the queryset itself is caching results (more likely), and the iterator is picking a currency which doesn't actually exist in the database.

jrobichaud commented 7 years ago

Depending of MySQL or SQLite it will fail on different fields.

The issue does not happen when not using iterators( see "Workarounds" 2 and 3)

jeffwidman commented 7 years ago

Thanks for the bug report. Afraid I'm not very familiar with how the Django ORM resolves foreign keys, so I'll let @rbarrois handle this one.

0ge commented 2 years ago

I think I encountered the same issue. As with the original report, the test cases passed when run one-by-one. Implementing workaround 2 fixed the issues for me.

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/django/test/testcases.py", line 284, in _setup_and_call
    self._post_teardown()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/django/test/testcases.py", line 1006, in _post_teardown
    self._fixture_teardown()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/django/test/testcases.py", line 1248, in _fixture_teardown
    connections[db_name].check_constraints()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/django/db/backends/sqlite3/base.py", line 358, in check_constraints
    bad_value, referenced_table_name, referenced_column_name
django.db.utils.IntegrityError: The row in table 'app_modela' with primary key '2' has an invalid foreign key: app_modela.modelb contains a value '3' that does not have a corresponding value in app_modelb.id.

Django: 3.2.3 Factory-boy: 3.2.0 Backend: SQLite

msimav commented 2 years ago

We had a similar issue. After debugging we see that the error is caused by QuerySet's stale cache when it is used with either factory.Iterator or factory.fuzzy.FuzzyChoice.

Django QuerySets will cache data once they are evaluated
factory.Iterator won't evaluate the QuerySet until it is used for the first time (which is useful to prevent other bugs)
In the setUp method, some test data is created that will be removed once the suite is run
- In the first time, QuerySet will be cached with the data that is created for that test case
- In the following cases, the cached data is used but it no longer exists in the db
- That causes an integrity error
Resetting the QuerySet is possible but factory.Iterator wraps it with another layer which makes it impossible

Minimal Example

# models.py
from django.db import models

class Question(models.Model):
    question_text = models.CharField(max_length=200)
    pub_date = models.DateTimeField("date published")

class Choice(models.Model):
    question = models.ForeignKey(Question, on_delete=models.CASCADE)
    choice_text = models.CharField(max_length=200)
    votes = models.IntegerField(default=0)

# factories.py
import factory
from datetime import datetime

class QuestionFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Question

    question_text = factory.Faker("sentence")
    pub_date = factory.Faker(
        "date_between_dates",
        date_start=datetime.date(2021, 1, 1),
        date_end=datetime.date(2021, 12, 31),
    )

class ChoiceFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = Choice

    question = factory.Iterator(Question.objects.all())
    choice_text = factory.Faker("sentence")
    votes = factory.fuzzy.FuzzyInteger(0, 100)

# tests.py
from django.test import TestCase

class TestCase1(TestCase):
    def setUp(self):
        QuestionFactory()
        ChoiceFactory()

class TestCase2(TestCase):
    def setUp(self):
        QuestionFactory.create_batch(5)
        ChoiceFactory.create_batch(20)

Running each test case separately (or in parallel) doesn't trigger an error but running them sequentially does.

Our Workaround

We created a new one that checks if the model instance is stale and resets the Queryset if so.

class QuerysetIterator(BaseDeclaration):
    """Fill this value using the values returned by a queryset.

    Attributes:
        queryset (Queryset): the queryset whose value should be used.
        getter (callable or None): a function to parse returned values
    """

    def __init__(self, queryset, cycle=True, getter=None):
        super().__init__()
        self.queryset = queryset
        self.cycle = cycle
        self.getter = getter
        self.iterator = None

    def evaluate(self, instance, step, extra):
        if self.iterator is None:
            self.reset()

        try:
            value = next(iter(self.iterator))
        except StopIteration:
            if self.cycle:
                self.reset()
                value = next(iter(self.iterator))
            else:
                raise StopIteration

        try:
            value.refresh_from_db()
        except ObjectDoesNotExist:
            self.reset()
            value = next(iter(self.iterator))

        if self.getter is None:
            return value
        return self.getter(value)

    def reset(self):
        """Reset the internal iterator."""
        self.iterator = self.queryset.all()

I'd like to create a PR if the solution makes sense.

dzejkobi commented 1 year ago

@msimav

We created a new one that checks if the model instance is stale and resets the Queryset if so.

Your explanation seems to be right and the solution worked, at least in my case. Thanks!

bldaj commented 1 week ago

@msimav Thank you for your full explanation and solution! I've faced with the same problem and your solution works fine for me. You saved my brain cells

FactoryBoy / factory_boy

Django Error when using multiple fixtures and factory.Iterator #336

Minimal Example

Our Workaround