Deprecate Fuzzy Attributes that are already supported in Faker

jeffwidman commented 8 years ago

I'd like to start a conversation around deprecating and eventually removing the Fuzzy Attributes that are already supported by Faker: http://factoryboy.readthedocs.org/en/latest/fuzzy.html

Reasoning: I'd prefer to limit the scope of FactoryBoy to handling turning input data into models, including complicated scenarios like chaining. And then delegate the fuzzing of the data as much as possible to external libraries like Faker.

Since we now wrap Faker, there's no reason for us to provide FuzzyAttributes that are already provided by Faker--it's just duplicated work. For any duplicate FuzzyAttributes that are more powerful than their Faker equivalents, where possible, I'd rather see that code moved over to Faker.

@rbarrois your thoughts?

rbarrois commented 8 years ago

Excellent idea! This made sense as long as faker wasn't supported or optional; but it's now providing a much better fuzzer for many functions.

We'll have to be careful on the upgrade path though — a backwards compatibility policy will soon be needed!

jeffwidman commented 8 years ago

Cool. Assuming this project is semvar, I think the todo's here are:

[x] add deprecation note in docs immediately.
[ ] go through the fuzzy implementations and make sure the faker equivalent is at least as powerful; if not, submit that code as a PR to faker
[ ] actually remove the code as part of a 3.x.x release since it's a backwards incompatible change.

Could certainly handle it differently, but that's the route I'd go down for fast and easy. I'll try to start working on this, as I suspect getting things into a published Faker release might take a little while.

abendebury commented 8 years ago

It seems that faker lacks a provider to pick a random choice out a of a list, like fuzzy.FuzzyChoice does.

MichaelAquilina commented 8 years ago

I have also noticed that faker doesnt have an equivalent for FuzzyChoice. What's the recommended route. Should you use FuzzyChoice in that case or resort to creating your own in order to avoid using any fuzzy attributes?

arugifa commented 8 years ago

Since fuzzy attributes (will) rely on an external dependency, what do you think about making their support optional? By putting something like extras_require = {'Fuzzy': ['fake-factory']} in setup.py.

rbarrois commented 8 years ago

@alexandre-figura factory_boy already has a non-optional dependency on faker ;)

rhunwicks commented 8 years ago

@PlasmaSheep said:

It seems that faker lacks a provider to pick a random choice out a of a list, like fuzzy.FuzzyChoice does

That's true. In the meantime, I have this:

TYPES = [e[0] for e in my_model.TYPE_CHOICES]

class MyFactory(factory.django.DjangoModelFactory):

    type = factory.Faker('random_element', elements=TYPES)

It doesn't work if you put the list comprehension directly into the factory.Faker call

chrislawlor commented 8 years ago

Regarding the missing FuzzyChoice equivalent in Faker, you can avoid having to create new lists of valid choices by adding a new provider to the bundled Faker:

# somefile.py
class ChoiceProvider(BaseProvider):
    def random_choice(self, choices=None):
        """
        Given a list of choices in Django format, return a random value
        """
        choice = self.random_element(choices)
        return choice[0]

You then have to add the new provider:

from somefile import ChoiceProvider
factory.Faker.add_provider(ChoiceProvider)

Once the new provider has been added, use it like this:

class MyFactory(factory.django.DjangoModelFactory):
    choice_field = factory.Faker('random_choice', choices=mymodel.CHOICE_FIELD_CHOICES)

Having to include the add_provider boilerplate in all your factory files is a bit sub-optimal. Personally, I feel it is a simple enough class that it could be added directly to factory boy instead of submitted as a PR to faker. I'm just not sure that it provides any value to faker outside the context of a django project using factory boy, so they'd be hesitant to accept it. I'd be happy to submit a PR here if anyone would like.

abendebury commented 8 years ago

I'm just not sure that it provides any value to faker outside the context of a django project using factory boy,

It's applicable for a lot of fields - literally any field that requires a choice out of some concrete set of choices, so I don't see why it's limited to django.

rbarrois commented 8 years ago

We could hide this with a minimal faker wrapper, e.g factory.FakerChoice(values)? Behind the scenes, this could register an arbitrary provider with faker and use it.

The other option is to improve Faker to make it super-easy to declare a provider and use it on the fly.

jeffwidman commented 8 years ago

Wherever possible, I'd rather we improve Faker... Historically, I've found @fcurella to be very receptive to pull requests, etc.

Unless we absolutely have no alternative, I'm not excited about creating wrappers modifying behavior from an external library. Every time we do that, we introduce global complexity for anyone trying to use Faker with FactoryBoy because they now have another thing they need to understand at least well enough to evaluate whether or not it applies to their use case.

fcurella commented 8 years ago

Count me in for any improvement on Faker :)

Keep in mind that originally Faker was a port of a PHP library.

As a result, some provider methods may look a little weird, non-pythonic, and possibly slower for no reason. I've been meaning to go through them and fix them, just haven't had a good chance to do it, yet.

kwasielew commented 5 years ago

Hey all, do you have any rough idea when this deprecation will happen? Thanks

rbarrois commented 5 years ago

@kwasielew No strict roadmap has been planned yet; one of the points to keep in mind is that Faker provides little tools to restrict the possible values of fields. This means that it would be complicated to replace, for instance, the FuzzyInteger or FuzzyDateTime fields without losing some ability to configure the range of values they should use.

fcurella commented 5 years ago

@kwasielew @rbarrois Feel free to submit issues and PRs that could help you on the Faker repo.

If you do, please include a link to this issue, so I can prioritize accordingly.

x-yuri commented 5 years ago

Regarding FuzzyChoice discussed above, there's random_element and friends. Not sure how new is the addition.

x-yuri commented 5 years ago

This is loosely related to the topic, but you might want to consider my use case here. Or I can create a separate issue, just tell me.

I've got models with unique fields. But I want them to receive random values. With factory.Iterator, values are not random. With factory.fuzzy.FuzzyChoice (as well as factory.Faker('random_element', elements=...)), values are random but not unique. Now, what do I mean by "unique"? I want them to not repeat themselves over the course of one test. From my experience with FuzzyChoice and factory.Faker('random_element', ...), they often do. Am I missing something here?

So, I came up with the following solution:

import unittest

import factory.random

class RandomUniqueElement(factory.declarations.BaseDeclaration):
    instances = []
    def __init__(self, elements, **kwargs):
        super().__init__(**kwargs)
        self.orig_elements = list(elements)  # we might receive dict_keys
                                             # or something
        RandomUniqueElement.instances.append(self)

    def evaluate(self, instance, step, extra):
        return self.elements.pop()

    def reset(self):
        print('-- shuffle')
        self.elements = self.orig_elements.copy()
        factory.random.randgen.shuffle(self.elements)

class MyModel:
    def __init__(self, *args, **kwargs):
        for k, v in kwargs.items():
            setattr(self, k, v)

class MyModelFactory(factory.Factory):
    class Meta:
        model = MyModel
    a1 = RandomUniqueElement(['1', '2', '3'])

class TC1Test(unittest.TestCase):
    def setUp(self):
        print('-- setUp')
        for i in RandomUniqueElement.instances:
            i.reset()
    def test_t1(self):
        print('-- ' + MyModelFactory().a1)
    def test_t2(self):
        print('-- ' + MyModelFactory().a1)
        print('-- ' + MyModelFactory().a1)
        print('-- ' + MyModelFactory().a1)

unittest.main()

Can Factory Boy be improved to support such a use case?

melvyn-apryl commented 3 years ago

So I don't see a way to use Faker in lazy attribute, where it's values would depend on other model fields.

When you create a new faker instance inside the lazy attribute, as you would with FuzzyDate, it does not return different values for each instance.

The concrete model:

class Employment:
    user = ...
    employment_start = models.DateField(
        db_index=True, verbose_name=_("Start of employment")
    )
    probation_end = models.DateField(
        db_index=True, verbose_name=_("End of probation period")
    )
    employment_end = models.DateField(
        null=True,
        db_index=True,
        verbose_name=_("End of employment"),
        blank=True,
    )

So the "EmployedAndOutOfProbation" factory for this, would need to implement the following rules:

employment_start must be be in between 6 months and one year of today (easily done by just making it a Faker("date_between_dates"))
probation_end must be 30 days to 6 months after employment_start (requires lazy_attribute for dependency on self.employment_start)
employment_end must be None or a date between today + one day and today + one year (lazy attribute for choice between candidate date and None)

I don't see a way to do this with Faker, whereas with FuzzyDate it's this:

    @factory.lazy_attribute
    def probation_end(self):
        start = self.employment_start
        return fuzzy.FuzzyDate(
            start_date=start + relativedelta(days=30),
            end_date=start + relativedelta(months=6)
        ).fuzz()

Did anyone manage to do this using faker?

melvyn-apryl commented 3 years ago

~~Never mind, just found this in the docs when working on a different project. I was too focused on using lazy_attribute :(~~

Nope, you can't do date arithmetic with a self attribute. I don't see a way to wrap this either. I'll keep looking...

shabble commented 3 years ago

Never mind, just found this in the docs when working on a different project. I was too focused on using lazy_attribute :(

Nope, you can't do date arithmetic with a self attribute. I don't see a way to wrap this either. I'll keep looking...

I have something working along the lines of:

    log_dt = factory.Faker(
        "date_time_between",
        start_date=factory.SelfAttribute("..cap.created_dt"),
        end_date=factory.LazyAttribute(
            lambda o: o.factory_parent.cap.created_dt + timezone.timedelta(minutes=3)
        ),
        tzinfo=timezone.utc,
    )

If I remember rightly, the confusing part is having to go via factory_parent and .. to reach the related entry.

astockwell commented 2 years ago

An equivalent for fuzzy.FuzzyChoice() to pick a single random choice out of a list (this works for me at least): specify the length ~

my_property=factory.Faker('random_choices', elements=['a', 'b', 'c', 'd'], length=1)

stefan6419846 commented 2 years ago

As I just stumbled upon the systematic issue regarding FuzzyDateTime and the force_XXX options: What is the current state of the general deprecations? I have to admit that its rather easy to miss the note on top of the documentation page for factory.fuzzy if explicitly looking for one of the specific fuzzy generators, while deprecations have been considered for over six years now.

Is there any analysis of which features are actually missing from faker's providers, leaving aside options like the aforementioned force_month which are not reliable and therefore not really worth porting? Such an overview might serve as some sort of migration guide as well.

francoisfreitag commented 2 years ago

This problem is in need of volunteers to address it. The fields can likely be ported one by one, each time following this rough procedure:

[ ] Verify Faker has an equivalent that field
[ ] Issue a DeprecationWarning for that field
[ ] Add a warning to the doc mark the fuzzy field as deprecated
[ ] Add a migration entry to the release note with an example of the equivalent Faker call
[ ] On the next major version, drop the field

Here’s an example that handles FuzzyInteger: https://github.com/FactoryBoy/factory_boy/pull/971.

adrian-nilsson-fcc commented 2 years ago

We may want to reconsider the general recommendation in the documentation of Fuzzy attributes to use Faker equivalents. For example, it's easy to think that pyfloat would work just as well as FuzzyFloat. However, here's a pitfall that I encountered:

from faker import Faker

fake = Faker()

min_value = 20.445
max_value = 82.152
for _ in range(1000):
    f = fake.pyfloat(right_digits=3, min_value=min_value, max_value=max_value)
    assert f >= min_value, f"{f} < {min_value}"
    assert f <= max_value, f"{f} > {max_value}"

you'll likely see an assertion error, for example:

Traceback (most recent call last):
  File "~/min_max_pyfloat.py", line 11, in <module>
    assert f >= min_value, f"{f} < {min_value}"
AssertionError: 20.108 < 20.445

If we replace pyfloat with FuzzyFloat, i.e., f = FuzzyFloat(low=min_value, high=max_value).fuzz(), the above example works as expected. @rbarrois already mentions in an earlier comment that Faker's support for restricting values is limited. This is such a case, and Faker needs to be updated before the corresponding Fuzzy attribute can be deprecated in Factory Boy.

Current Faker behaviour

It's not really documented in Faker, but in pyfloat's implementation you can see that only the integral part (left_digits) respects the min_value and max_value parameters. The decimal part is just a random integer.

It is discussed to support float values for min_value and max_value in issue 1068, but there has been no activity after it went stale.

rbarrois commented 2 years ago

I tend to be -1 on this overall topic: faker is great for generating "human" data — names, phone numbers, emails, etc.

For lower level Python structures, a few issues have popped up, showing that faking "native" values isn't Faker's focus.

The cost of maintaining factory.fuzzy is extremely low, and it is used by lots of library users (from before the Faker era). If it works, and the alternative isn't significantly better, I see no reason to deprecate the feature and impose a migration cost on users.

We could keep a warning at the top of the doc stating that the fuzzy module is in maintenance-only mode, encouraging users to rely on Faker's providers instead.

However, I do not agree with deleting working code simply because an alternative exists.

jeffwidman commented 2 years ago

😀

Unlike @rbarrois I'm all about deleting working code if a good alternative exists because it reduces the inevitable maintenance cost over time.

Re: breaking existing users code... my anecdotal experience across maintaining many libraries is that any projects still relying on this code from more than a few years ago will likely never update. So they'll never pay the migration price anyway. And for the very small handful that do, it's trivial to change the imports... plus odds are very high this is solely used in their test code, not prod code so any breakage is unlikely to have a big impact.

That said, in the 6 (!!) years since I originally opened this issue, it's definitely seemed like there's been a small but steady stream of cases where Faker doesn't really provide an appropriate solution, and unlikely to do so in the future either. So I think that @rbarrois is spot on that completely deprecating factory.fuzzy is now probably the wrong move.

I took a quick audit of factor.fuzzy to see what is duplicated in faker and at first I thought faker.providers.datetime seemed like the only direct correlation... in which case forcing users to add an import for a single function seems rude.

However, I stumbled across https://faker.readthedocs.io/en/master/providers/faker.providers.python.html which looks like a direct match for most of the simple FuzzyX types... FuzzyText, FuzzyInteger, FuzzyDecimal, FuzzyFloat, etc.

The ones that seem to be unique to FactoryBoy are:

FuzzyAttribute
FuzzyChoice
Custom fuzzy fields derived from BaseFuzzyAttribute

I do not feel strongly about this one, but if I had to make a choice, I'd remove the overall deprecation warning, and only deprecate the FuzzyTypes that have direct matches in Faker. I'd probably also add a warning.warn("...", DeprecationWarning) for a few releases, and then eventually remove it.

But again, I do not feel strongly on this, so @rbarrois and others feel more strongly, I happily defer to them.

francoisfreitag commented 2 years ago

I do see value in “There should be one-- and preferably only one --obvious way to do it.”

I read the intention of this issue as “let’s mutualize the efforts with Faker, to improve Faker where FactoryBoy is better, and leave the generation of fuzzy data to Faker”. The goal is to reduce the size (and complexity) of FactoryBoy, by letting a more specialized library handle the fuzzy generation.

That’s where my motivation to remove most of the the fuzzy module stems, and why I still believe it is a good thing.

FuzzyAttribute main (only?) interest seems that it’s using the random seed from FactoryBoy. The developer is responsible for writing the callable that does the fuzzing. Not a huge deal to keep or drop.
FuzzyChoice looks like it could probably be replaced by https://faker.readthedocs.io/en/master/providers/baseprovider.html#faker.providers.BaseProvider.random_element.

The fuzzy module also has its shortcomings, e.g. https://github.com/FactoryBoy/factory_boy/issues/969. Sure, they can be fixed. Would the efforts be better targeted to improve Faker? That’s my belief.

Vinh-CHUC commented 11 months ago

Would people consider removing the deprecation warning in the docs, and leave the status quo (use Fuzzy) as the default? I believe it is quite confusing for anyone that reads that documentation page and who doesn't take the time (understandably!) to read this whole discussion thread:

Faker isn't really meant to be used for low-level fake data generation, you have to do a lot of digging (because Faker's docs are, rightly so, not focused on this use case!) to figure out how to do it while the Fuzzy examples are very clearly documented in the same place.
taking a step back: being stuck in this limbo is worse than either solution in my opinion. Again "worse" as it creates confusion for the users regarding what they should do

More context

The last time (4 years ago :P) that I introduced factory_boy into a new codebase I went down the Faker route and I really regretted it... As people said above Faker doesn't really cater well to the "low-level" use case that is fulfilled by Fuzzy. I had used fuzzy before that and it was much simpler to use for the low-level use case. I'm now back to using Python again at work and am quite disappointed to see that this whole issue is still outstanding :P

A 3rd way?

A 3rd way would be to have Fuzzy act as a very thin layer on top of faker where there is potential for reusability?:

this would solve the issue of Faker might/might not do this
- although in my humble opinion I'm skeptical of the value of reusing here, the two libs target different use case. And the Fuzzy logic is very stable. But at least in doing so we can shield end users from this debate
as a bonus we could still link to faker for the use cases where people do need higher-level fake data (with nice examples of how to bridge the two, ie calling faker from factoryboy)

Again please consider the adverse impact on users here who probably go through a lot of hurdles when it comes to generating fake data since (6 years ago!!) that messaged popped on the docs. I find it quite bad that the official docs are in this very weird "uncommitted" state for more than half a decade!!!

x-yuri commented 11 months ago

@Vinh-CHUC I must admit, I've read way longer threads... Anyways, can you possibly give a couple of examples? Examples where the fuzzy attributes shine, and Faker doesn't.

Vinh-CHUC commented 11 months ago

@x-yuri sure anyone is able to read but it's not very user friendly... I don't think usage of the library should be reserved to people who have the patience to deep dive in 6 year old threads

again my problem is not so much about Fuzzy vs Faker debate in itself but rather this weird limbo situation, and the fact that the quality of the docs (and hence user experience) suffers as a result

stefan6419846 commented 11 months ago

and the fact that the quality of the docs (and hence user experience) suffers as a result

Faker is a FOSS project as well and you are always invited to contribute enhancements, for example by making it easier for users/starters to understand the docs and thus improving UX by submitting corresponding PRs.

Vinh-CHUC commented 11 months ago

@stefan6419846 i might give it a go, my thinking would be to rather contribute to Factory Boy's docs to clearly indicate what are the Faker equivalent rather than the other way around

I'm thinking that FactoryBoy and Faker still "target" different use cases (low-level vs high-level), it also avoids jumping between two documentation websites. E.g. if I don't need to use Faker "high-level" stuff I'd rather not have to jump to another website

@x-yuri for FuzzyAttribute (ie use an arbitrary callable) if I'm not wrong you have to create a Faker custom provider it's a bit cumbersome in my opinion, looks like the method name has to be unique for example. The FactoryBoy version is more ephemeral and doesn't require to register something

francoisfreitag commented 11 months ago

for FuzzyAttribute (ie use an arbitrary callable) if I'm not wrong you have to create a Faker custom provider it's a bit cumbersome in my opinion, looks like the method name has to be unique for example. The FactoryBoy version is more ephemeral and doesn't require to register something

https://github.com/FactoryBoy/factory_boy/issues/271#issuecomment-1294030379 already plans on keeping FuzzyAttribute. FuzzyChoice might also be kept, because it’s lazy and works well with Django QuerySets.

For the remaining fuzzy attributes (FuzzyInteger, FuzzyDecimal, FuzzyFloat, FuzzyDate, FuzzyDateTime and FuzzyNaiveDateTime), I believe porting them to Faker should be quite reasonable. I started with FuzzyInteger a while back, and the PR was approved by 2 other contributors. I was waiting for further input, specifically from @rbarrois who seemed against the whole idea. But if none is provided, might as well proceed with the original plan and actually deprecate, then remove these attributes, to get out of limbo.

IMHO, removing the deprecation isn’t sufficient to clarify the matter, because users will get two very similar (but sometimes confusingly different) ways of generating the same data, with little guidance on which to use.

tu-pm commented 1 month ago

There's an important aspect to this problem that I'm not seeing anyone mentioning here, which is that the factoryboy's Faker module does not give good suggestion on how to use it. It's signature looks like this:

class Faker(declarations.BaseDeclaration):
    def __init__(self, provider, **kwargs):
        ...

which tells nothing about the available providers and what their accepted arguments are.

This is a big problem for casual users (like me) who just want to generate random text, numbers and choices quickly. Having to open Faker docs to find the suitable providers and learn how to use each of them every time I want to write a more advance test is not worth it.

Meanwhile, if I were using the good ole FuzzyChoice or FuzzyText, I'd just type the name and the IDE can tell me which arguments I should pass in right away without any hassle:

class FuzzyChoice(BaseFuzzyAttribute):
    """Handles fuzzy choice of an attribute.

    Args:
        choices (iterable): An iterable yielding options; will only be unrolled
            on the first call.
        getter (callable or None): a function to parse returned values
    """

    def __init__(self, choices, getter=None):
        ...

As a user of this awesome library, I'd 100% vote for keeping the fuzzy classes for simple usecases and use faker for other advanced stuff.

FactoryBoy / factory_boy