FactoryBoy / factory_boy

A test fixtures replacement for Python
https://factoryboy.readthedocs.io/
MIT License
3.49k stars 392 forks source link

support for object generation from marshmallow schema #277

Open jo-tham opened 8 years ago

jo-tham commented 8 years ago

Hi,

is there any interest in adding support for generating objects based on marshmallow schema? What would the main steps be?

https://github.com/marshmallow-code/marshmallow

perhaps it is practical to use the ORM backend and simply constrain the fields to those used in the marschmallow schema...

jeffwidman commented 8 years ago

I started using marshmallow recently, and very happy with it, so I certainly wouldn't be opposed to adding support.

However, I'm not clear what the use case is here... can you elaborate?

jo-tham commented 8 years ago

Thanks, Jeff

to elaborate:

I have a flast-restful api which uses marshmallow schema to serialize and preprocess ORM objects. http://marshmallow-jsonapi.readthedocs.org/en/latest/quickstart.html#flask-integration (not worried about the jsonapi-part; it's easy enough to wrap valid data in the jsonapi format)

I am writing tests for the api using requests and a running instance of the application.

I'm currently writing data fixtures by hand for use in POST requests. It would be better to generate these fixtures.

side benefit - since marshmallow is agnostic about the objects it serializes, it might be useful as a general schema for generating objects, too. people could have schemas for their fixtures separate from the types of objects they are dealing with (e.g. mongoengine vs sqlalchemy)

Does that make sense? What do you think?

I also thought it might fit as module in marshmallow or a library unto itself integrating factory_boy and marshmallow.

rbarrois commented 8 years ago

Hi,

marshmallow looks quite interesting indeed :)

Building adapters for factory_boy is (hopefully) rather easy; regarding marshmallow, how would you integrate them? I find it helpful to start writing down usage examples: this helps to clarify the problem we're solving ;)

So, how would you want to call a factory related to a marshmallow schema? :-)

jo-tham commented 8 years ago

thanks for the input @rbarrois!

i will make some time this evening to create a couple usage examples. I will also take a look at the factory_boy modules/api and add some comments about making the integration.

jo-tham commented 8 years ago

Here's a hypothetical usage

import datetime as dt
from marshmallow import Schema, fields
from factory_boy import factory

# serializes any object to dictionary based on class attributes
class UserSchema(Schema):
    username = fields.Str()
    joined_at = fields.DateTime()
    password = fields.Str(load_only=True)

class UserSchemaDataFactory(factory.marshmallow.MarshmallowSchemaFactory):
    class Meta:
        model = UserSchema

    username = factory.Faker('username')
    joined_at = factory.Faker('datetime')
    password = factory.Faker('word')

user = UserSchemaDataFactory.stub()

print(user)
# {
#   'username': 'morpheus2',
#   'joined_at': '2016-03-01 21:16:06.748186',
#   'password': 'auspcicious'
# }

type(d)
# <class 'dict'>

But it looks like the desired outcome is possible via outputting factory to a dict.

factory.build(dict, FACTORY_CLASS=UserSchemaDataFactory)

What I really hoped was to introspect on the Meta model to bypass the need to declare attributes of the factory class, i.e.

class UserSchemaDataFactory(factory.marshmallow.MarshmallowSchemaFactory):
    class Meta:
        model = UserDataSchema

user = UserSchemaDataFactory.stub()

I thought this existed for the ORM handlers in factory_boy but it's not the case.

It seems to in mixer which was the other package I was considering for generating fixtures

jeffwidman commented 8 years ago

Sorry, we just had a baby so I've been short on time/sleep.

I hit a similar issue as you, and ultimately ended up just using the dict recipe. I found that more effective actually because I needed to manually specify which fields to include in my test fixtures to make sure I'd properly set Marshallow's load_only/dump_only attributes on different fields.

Ultimately, it sounds like the feature request here is ORM class introspection to generate field types automatically... nothing Marshmallow specific.

Not sure how @rbarrois feels about this, but I'm hesitant to add this because I'm not convinced it would add enough value to be worth the additional maintenance.

When I look at my projects, for the same SQLAlchemy field types I've used multiple factory_boy generators depending on foreign keys, custom field constraints, etc. I want my fake data to mirror the real data as much as possible, so I often tweak the faker generators.

There's other random problems when trying to magically guess the correct fake data. For example, faker has a limited set of words in the lorem ipsum dataset, and I ran into problems trying to use this for fake data that needed to be unique... I had to tack on a random character.

So I'm just not convinced there's enough of a 1:1 mapping between field types and factory_boy/faker to be worth writing/maintaining the introspection code. This holds true at least for SQLAlchemy, perhaps it's different for Django.

Again, I'm running on relatively little sleep, so if I'm overlooking anything obvious here, feel free to point it out.

PS:

I am writing tests for the api using requests and a running instance of the application.

You can skip requests and just use the built-in Flask test client which is just a wrapper around Werkzeug's test client. It's how I test my API, and it works perfectly. Makes it easy to push a test_request context, etc.

rbarrois commented 8 years ago

Well, model introspection is a work-in-progress dating back one year ago, see here: https://github.com/rbarrois/factory_boy/commit/4046c55710d5d7073018dcc76aa3e8e5a7f803eb

It still needs some work (more robust code, more tests, lots more docs).

If someone is interested in helping there, let's go!

jo-tham commented 8 years ago

Sorry for delay gentleman, I started a new job this week and was wrapping up projects for clients before that.

I still need to review mixer and see if it's a better fit for marshmallow schema. If mixer doesn't look more convenient/robust, I'd love to pick up on 4046c55 (and if marshmallow doesn't fit in factory boy I can make a separate lib for generating from marshmallow). It'll be a few weeks until I settle in to the new job and have time to evaluate these things. Glad to assist in review if anyone gets to it first.

rbarrois commented 8 years ago

@jo-tham Awesome! Let us know if you need any help on this, or if you find shortcomings in factory_boy that cause you to discard it ;)

levic commented 7 years ago

@rbarrois I have been working on getting your branch 4046c55 working with the current master, however have a design question:

For me, the main desire for an automatic generator is that I want to be able to create a record with the minimal amount of data necessary to allow it to save. Fields that have defaults, or can be blank/null generally don't need a factory boy definition.

For example, if you have a CharField(max_length=20,blank=True,null=False) then django will already initialise this field to '' by default. Similarly, IntegerField(default=0,null=False) doesn't need a definition.

Would you have an objection with me changing the behaviour (for django -- would actually be up to each introspector to decide what is necessary) to only generate data where it's actually necessary in order to be able to save the record?

levic commented 7 years ago

To clarify: this is when determining what fields to auto generate; if you specifically include a field in the list of fields to generate then it would still use fuzzy/faker to generate a random value

rbarrois commented 7 years ago

@levic wow, awesome!!

Your suggestion looks good to me; and we might still add an option to say fields = ['*'] to force generating a relevant faker/fuzzy for each field.

levic commented 7 years ago

I have essentially completed it at https://github.com/levic/factory_boy/commits/wip/auto_factory I wasn't planning to do a pull request until I've used it for a week in a production project in case the test cases missed something (unless you want to replace the wip/auto_factory branch in the main repo)

General Changes

Factory Interface changes


- `default_auto_fields` - if True then the default set of fields will be included
- `include_auto_fields` - tells the introspector to additionally autogenerate definitions for these fields. If `default_auto_fields` is False then fields listed in `include_auto_fields` will still be included
- `exclude_auto_fields` - tells the introspector to never autogenerate definitions for these fields
- The rationale behind this is to allow for easier inheriting of settings for abstract factories (eg to have a base class in your application that has a blacklist of fields that are always included/excluded regardless of the model).

**Introspector Interface Changes**
- In my particular use case I want to first of all look at field names and only if these don't match then fall back to looking at field types. This was possible before but was a bit clumsy. I've changed the interface so that `build_declaration` is easier to override and do this (more of the logic that you shouldn't need to override is now in `build_declarations` -- [here](https://github.com/levic/factory_boy/commit/a93ee3e) )

**Including all Fields**
- I looked at making `'*'` an option to `include_auto_fields`, but there are still fields you don't want to include (eg reverse foreign keys, AutoField).
    - Instead I made a variant introspector; see the code [here](https://github.com/levic/factory_boy/commit/9d20fa15018ca727f9290b6b588a1ac9378354e9#diff-7c928ff073a9075cc628be362f42a7f7R1014) for an example. It is not as succinct as just including `'*'` but it substantially simplifies the internal implementation.

**Outstanding Issues**
- I have looked at the logic in model mommy and added test cases for issues that they came across, but haven't tested [generic relations](https://docs.djangoproject.com/en/1.11/ref/contrib/contenttypes/#generic-relations) (there is code in there but it may or may not actually work)
- No documentation update. Will try to do this sometime in the next week or two before the pull request
SirR4T commented 5 years ago

Hey @jo-tham, I just ran into this

I am writing tests for the api using requests and a running instance of the application.

I'm currently writing data fixtures by hand for use in POST requests. It would be better to generate these fixtures.

exact use case, and was wondering if there is any update regarding the ability to generate fake data from a marshmallow schema. Is this currently possible without having to resort to a SQLAlchemy or Django model?

EDIT: the reason I do not want to use an ORM model is because I'm trying to generate data for ElasticSearch documents, and not the DB layer. Would be interested to know if there are other tools / options available at the moment to do this.

etiology commented 5 years ago

Hey All,

I use factory-boy for Marshmallows and it's great. In testing I sometimes want the marshmallow's JSON representation and other times I need the object version of the data. To accomplish this I've created a mix-in to easily make both types.

Example Schema:

# coding=utf-8
from marshmallow import Schema
from marshmallow import fields

class DemographicsSchema(Schema):
    """ User Info """
    first_name = fields.String()
    last_name = fields.String()
    dob = fields.Date()

    # Address
    street1 = fields.String()
    street2 = fields.String()
    city = fields.String()
    state = fields.String()
    zip_code = fields.String()

Example Factory


# ------------------------------------------
#   Demographics
# ------------------------------------------
class _DemographicsFactory(factory.Factory):
    class Meta:
        model = DemographicsSchema

    first_name = factory.Faker('first_name')
    last_name = factory.Faker('last_name')
    dob = factory.Faker('date_time_this_century', before_now=True)

    # Address
    street1 = factory.Faker('address')
    street2 = factory.Faker('secondary_address')
    city = factory.Faker('city')
    state = factory.Faker('state_abbr')
    zip_code = factory.Faker('zipcode')

class DemographicsStrFactory(_DemographicsFactory, JSONFactoryMixin):
    """ Creates JSON Serialized model of the factory data """

class DemographicsObjFactory(_DemographicsFactory, ObjFactoryMixin):
    """ Creates Deserialized model of the factory data """

My Mixins

# coding=utf-8
import factory

class JSONFactoryMixin(factory.Factory):
    """ Overwrites Factory._create() to produce JSON serialized models """

    @classmethod
    def _create(cls, model_class, *args, **kwargs):
        """Override the default ``_create`` with our custom call."""
        schema = model_class()
        results = schema.dumps(kwargs)
        assert not results.errors

        return results.data

class ObjFactoryMixin(factory.Factory):
    """ Overwrites Factory._create() to produce deserialized models """

    @classmethod
    def _create(cls, model_class, *args, **kwargs):
        """Override the default ``_create`` with our custom call."""
        schema = model_class()
        results = schema.dump(kwargs)
        assert not results.errors

        return results.data

Example of Use

# imagine we need the serialized version of this model
demographics_json = DemographicsStrFactory()

# if we need the de-serialized version
demographics_dict = DemographicsObjFactory()
simonvanderveldt commented 5 years ago

Has any consensus been reached about the AutoFactory approach? @rbarrois @levic This would be really nice functionality to have to make it less work/verbose to define simple schemas.

rbarrois commented 5 years ago

@simonvanderveldt I'm still very much in favour of this idea, and had designed a couple of drafts in the past. The main obstacles are:

  1. Finding the time to implement a working proposal for the core engine
  2. Designing a clean API which would allow to plug custom models/fields/rules for each project
  3. Writing a comprehensive documentation and related tests.

Also, we could make this idea easier to find by adding a dedicated issue in the tracker :wink:

arthurHamon2 commented 3 years ago

👋 I just saw this issue and I've taken this work back:

I've created a draft PR https://github.com/FactoryBoy/factory_boy/pull/822 @rbarrois can I create a different issue and close this one in order to discuss the points you mentioned above ?