FactoryBoy / factory_boy

A test fixtures replacement for Python
https://factoryboy.readthedocs.io/
MIT License
3.48k stars 392 forks source link

Strategy to create related objects without creating main object #652

Open RevolutionTech opened 4 years ago

RevolutionTech commented 4 years ago

The problem

I introduced Factory Boy to the team I work on and we love using it, but there is one scenario that comes up frequently for us where we find Factory Boy does not have a good solution.

When testing deserialization of an object that has one or more related objects associated with it, we often want to create all of the dependent objects and build (without creating) the object we want to deserialize. How I would typically deal with this situation is to use the dependent factory to create dependent object before using the other factory.

A typical test of a serializer might look something like this:

class AuthorFactory(factory.DjangoModelFactory):
    class Meta:
        model = Author

    first_name = factory.Faker("first_name")
    last_name = factory.Faker("last_name")

class BookFactory(factory.DjangoModelFactory):
    class Meta:
        model = Book

    author = factory.SubFactory(AuthorFactory)
    title = factory.Sequence(lambda n: f"Book {n}")

class TestBookSerializer(TestCase):

    def test_deserialize_book(self):
        author = AuthorFactory()
        book_data = BookFactory.build(author=author)

        data = {
            "author": book_data.author_id,
            "title": book_data.title,
        }

        serializer = BookSerializer(data=data)
        self.assertTrue(serializer.is_valid())
        book = serializer.save()

        self.assertEqual(book.author, author)
        self.assertEqual(book.title, book_data.title)

What I'd like is to be able to write the same test without having to create the author before building the book. It would be ideal if there was a strategy I could use that would build the book, but create the dependent author.

In this simple example, it's easy to say that there's nothing wrong with just having that additional line where the author is created first, but the problem really comes into play when a model has many related objects, and all of these have to be created for every test.

Proposed solution

In addition to the strategies build, create, and stub, it would be great if there was another strategy that would build the given factory but create the related objects from the corresponding subfactories, maybe something like create_related_and_build. Alternatively if there was a way to plug in custom strategies into factory boy, that could possibly work too.

Extra notes

I apologize if there actually already is a nice way to solve this problem, but I wasn't able to find anything from some searches on StackOverflow or through the documentation. If that is the case, please consider this issue a request to add an example to the documentation.

Also feel free to tell me that I'm completely thinking about this the wrong way. If that's the case, I'd be open to suggestions on good patterns for writing tests as I have in my example above.

remarkov commented 4 years ago

@RevolutionTech sorry I was just passing by looking for a completely different thing and somehow noticed this issue which for some reason took my attention :) We've been using factory_boy for quite a lot now in our tests and your pattern looks very familiar to me so I thought I should step in. So, it seems to me that you are doing it absolutely right way. And factory boy actually already does what you want (as far as I can understand it). I've made a quick snippet from your code, here it is:

import factory

class Author:
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

class Book:
    def __init__(self, author, title):
        self.author = author
        self.title = title

class AuthorFactory(factory.DjangoModelFactory):
    class Meta:
        model = Author

    first_name = factory.Faker("first_name")
    last_name = factory.Faker("last_name")

class BookFactory(factory.DjangoModelFactory):
    class Meta:
        model = Book

    author = factory.SubFactory(AuthorFactory)
    title = factory.Sequence(lambda n: f"Book {n}")

book_data = BookFactory.build()

And if I then do print(book_data.author.first_name) I will get the author's first name even though as you can see I never called for AuthorFactory. That's what SubFactory does.

However, getting back to your example, the problem arises when you want to self.assertEqual(book.author, author) or assert book.author == author as we would do that in pytest. On one hand you could just assert book.author == book_data.author but if you want to keep under control of what should be the author's data you have to explicitly create that object first so you can check it later. Since in your example you create the author with default values you can easily skip that and compare to book_data.author later. But if you want to create the author with some special values and/or perform some actions on it before you run the serializer than you do that first. E.g. if you want to parametrize author's first name with weird things like empty value, different languages, emojis etc. and see how your serializer eats that. But even in that case, if it is just that, you can do something like

book_data = BookFactory.build(author=AuthorFactory.build(first_name=weird_first_name))

unless you need to do a lot of nasty things with your author :)

In some tests this is really not necessary, it depends more on the scope of testing and your testing strategy. However, this is off the topic for factory boy I guess.

I hope this will shed some light to your question. Happy testing!

RevolutionTech commented 4 years ago

Hi @remarkov, thank you so much for your thoughtful reply!

Unfortunately, I don't think I did an adequate job explaining the problem, so I will try to provide a more illustrative example. Imagine the Book model has many ForeignKeys and thus the corresponding factory has many SubFactories:

class BookFactory(factory.DjangoModelFactory):
    class Meta:
        model = Book

    author = factory.SubFactory(AuthorFactory)
    editor = factory.SubFactory(EditorFactory)
    illustrator = factory.SubFactory(IllustratorFactory)
    publisher = factory.SubFactory(PublisherFactory)
    title = factory.Sequence(lambda n: f"Book {n}")
    genre = factory.SubFactory(GenreFactory)

As you've mentioned, FactoryBoy makes it really easy for us to create a Book instance without having to define all of the underlining objects (except for those attributes we care about):

book = BookFactory()  # this created an author, editor, illustrator, etc.

And so testing the serialization of my book might look something like this:

class TestBookSerializer(TestCase):

    def test_serialize_book(self):
        book = BookFactory()  # single line to generate my fixtures for the test

        serialized_data = BookSerializer(book).data

        expected = {
            "author": book.author_id,
            "editor": book.editor_id,
            "illustrator": book.illustrator_id,
            "publisher": book.publisher_id,
            "title": book.title,
            "genre": book.genre_id,
        }
        self.assertEqual(serialized_data, expected)

So far so good!

Now the problem comes in when I want to write my deserialization test. I can't use the create strategy here, because the book should not already exist in the database before the deserialization occurs. I also cannot use the build or stub strategies here, because the author, editor, and other related objects must be created before the deserialization occurs. Therefore, I must use the create strategy to individually create all of the dependent objects and the build strategy to create the book:

class TestBookSerializer(TestCase):

    def test_deserialize_book(self):
        # n+1 lines to generate my fixtures for the test, where n is the number of dependent objects
        author = AuthorFactory()
        editor = EditorFactory()
        illustrator = IllustratorFactory()
        publisher = PublisherFactory()
        genre = GenreFactory()
        book_data = BookFactory.build(
            author=author,
            editor=editor,
            illustrator=illustrator,
            publisher=publisher,
            genre=genre
        )

        data = {
            "author": book_data.author_id,
            "editor": book_data.editor_id,
            "illustrator": book_data.illustrator_id,
            "publisher": book_data.publisher_id,
            "title": book_data.title,
            "genre": book_data.genre_id,
        }

        serializer = BookSerializer(data=data)
        self.assertTrue(serializer.is_valid())
        book = serializer.save()

        self.assertEqual(book.author, author)
        self.assertEqual(book.editor, editor)
        self.assertEqual(book.illustrator, illustrator)
        self.assertEqual(book.publisher, publisher)
        self.assertEqual(book.title, book_data.title)
        self.assertEqual(book.genre, genre)

This pattern unfortunately becomes a bit unwieldy, but I think could be remedied by an additional strategy in FactoryBoy, which creates all of the related objects, but builds/stubs the main object. Then the test above would look something more like:

class TestBookSerializer(TestCase):

    def test_deserialize_book(self):
        # back to single line fixture generation 8)
        book_data = BookFactory.create_related_and_build()

        data = {
            "author": book_data.author_id,
            "editor": book_data.editor_id,
            "illustrator": book_data.illustrator_id,
            "publisher": book_data.publisher_id,
            "title": book_data.title,
            "genre": book_data.genre_id,
        }

        serializer = BookSerializer(data=data)
        self.assertTrue(serializer.is_valid())
        book = serializer.save()

        self.assertEqual(book.author, book_data.author)
        self.assertEqual(book.editor, book_data.editor)
        self.assertEqual(book.illustrator, book_data.illustrator)
        self.assertEqual(book.publisher, book_data.publisher)
        self.assertEqual(book.title, book_data.title)
        self.assertEqual(book.genre, book_data.genre)
remarkov commented 4 years ago

@RevolutionTech Ah! Sorry for misunderstanding from the first place, indeed I missed the point. Now it is clear what you are trying to achieve and this totally makes sense.

For some reason we did not face this problem in our project, probably because of different architecture. I hope one day the proper solution could be implemented in Factory Boy but as a quick and really dirty workaround I would do something like

book_data = BookFactory()
book_data.delete(keep_parents=True)
data = {
    ...
}

This way you will leave all related objects in DB, delete the Book from DB but leave it's instance in memory for further use in data deserialization. Hope this works!

NB: Not sure you need the keep_parents parameter but I mentioned it just in case you will encounter cascade delete issues. I don't have Django at hands right now to check this myself but I hope you've got the idea and might find it helpful.

fedehuguet commented 4 years ago

I am facing the same issue, and I tried solving it by subclassing SubFactory and overriding the generate method, however, I stumbled upon: ValueError: save() prohibited to prevent data loss due to unsaved related object.

I haven't given it much thought, but basically here's what I tried:

class ForceCreateSubFactory(SubFactory):
    def generate(self, step, params):
        step.builder.strategy = enums.CREATE_STRATEGY
        subfactory = self.get_factory()
        force_sequence = step.sequence if self.FORCE_SEQUENCE else None
        return step.recurse(subfactory, params, force_sequence=force_sequence)

Has anybody been close to solving it? Or has any ideas on how to approach it?

RevolutionTech commented 4 years ago

Sorry I haven't come back to this problem for a while now. Of course it would be great to have a solution implemented as part of Factory Boy, but I actually just tried out @remarkov's workaround and it does sufficiently solve my use case. I will probably start applying that pattern throughout our codebases since I think it's clearer than the n+1 lines we have now.

One thing I'll point out is that I don't think the keep_parents parameter will prevent .delete() from cascading, but rather it will prevent parent instances from being deleted in the case of table inheritance. That was inapplicable to my case, so I omitted the keep_parents parameter. That said, I can't currently think of a scenario where this pattern would cause cascading deletes.

@fedehuguet Did the workaround suggested work for your use case? If not, I'd be curious to understand the scenario.

vvalentinvolkov commented 1 year ago

I get around this problem like this:

class MyFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = MyModel

    sub_factory_field = factory.SubFactory("MySubFactory")

    class Params:
        create_related = factory.Trait(sub_factory_field=factory.LazyFunction(MySubFactory.create))

So, you can pass create_relatedMyFactory.stub(create_related=True) and replace SubFactory with LazyFunction, that will create MySubFactory instance. But note: stub method creates stub of MySubFactory, and only then replaces it with LazyFunction.

benjfield commented 3 weeks ago

The cleanest way I've found is doing this:

class MyFactoryCreateRelated(MyFactory):
    @classmethod
    def _create(cls, model_class, *args, **kwargs):
        return super()._build(model_class, *args, **kwargs

As long as you don't have any custom build/create logic this should work fine I think as the subfactories are created before the instance is saved or not in the _create step.

benjfield commented 3 weeks ago

Or perhaps this would be helpful:

class BuildCreateRelatedMixin():
    @classmethod
    def build_but_create_related(cls, *args, **kwargs):
        prior_create = cls._create
        cls._create = cls._build
        object = cls.create(*args, **kwargs)
        cls._create = prior_create
        return object
tu-pm commented 1 week ago

I think this is the simplest/safest solution:

class BuildCreateRelatedMixin:
    _build_only = False

    @classmethod
    def _create(cls, model_class, **kwargs):
        build_only = kwargs.pop('_build_only', False)
        if build_only:
            return model_class(**kwargs)

        return super()._create(model_class, **kwargs)

    @classmethod
    def build_but_create_related(cls, **kwargs):
        kwargs.setdefault('_build_only', True)
        return cls.create(**kwargs)