Open jeffwidman opened 8 years ago
Excellent idea! This made sense as long as faker wasn't supported or optional; but it's now providing a much better fuzzer for many functions.
We'll have to be careful on the upgrade path though — a backwards compatibility policy will soon be needed!
Cool. Assuming this project is semvar, I think the todo's here are:
Could certainly handle it differently, but that's the route I'd go down for fast and easy. I'll try to start working on this, as I suspect getting things into a published Faker release might take a little while.
It seems that faker lacks a provider to pick a random choice out a of a list, like fuzzy.FuzzyChoice
does.
I have also noticed that faker doesnt have an equivalent for FuzzyChoice
. What's the recommended route. Should you use FuzzyChoice
in that case or resort to creating your own in order to avoid using any fuzzy attributes?
Since fuzzy attributes (will) rely on an external dependency, what do you think about making their support optional? By putting something like extras_require = {'Fuzzy': ['fake-factory']}
in setup.py
.
@alexandre-figura factory_boy already has a non-optional dependency on faker ;)
@PlasmaSheep said:
It seems that faker lacks a provider to pick a random choice out a of a list, like fuzzy.FuzzyChoice does
That's true. In the meantime, I have this:
TYPES = [e[0] for e in my_model.TYPE_CHOICES]
class MyFactory(factory.django.DjangoModelFactory):
type = factory.Faker('random_element', elements=TYPES)
It doesn't work if you put the list comprehension directly into the factory.Faker
call
Regarding the missing FuzzyChoice
equivalent in Faker, you can avoid having to create new lists of valid choices by adding a new provider to the bundled Faker
:
# somefile.py
class ChoiceProvider(BaseProvider):
def random_choice(self, choices=None):
"""
Given a list of choices in Django format, return a random value
"""
choice = self.random_element(choices)
return choice[0]
You then have to add the new provider:
from somefile import ChoiceProvider
factory.Faker.add_provider(ChoiceProvider)
Once the new provider has been added, use it like this:
class MyFactory(factory.django.DjangoModelFactory):
choice_field = factory.Faker('random_choice', choices=mymodel.CHOICE_FIELD_CHOICES)
Having to include the add_provider
boilerplate in all your factory files is a bit sub-optimal.
Personally, I feel it is a simple enough class that it could be added directly to factory boy instead of submitted as a PR to faker. I'm just not sure that it provides any value to faker outside the context of a django project using factory boy, so they'd be hesitant to accept it. I'd be happy to submit a PR here if anyone would like.
I'm just not sure that it provides any value to faker outside the context of a django project using factory boy,
It's applicable for a lot of fields - literally any field that requires a choice out of some concrete set of choices, so I don't see why it's limited to django.
We could hide this with a minimal faker wrapper, e.g factory.FakerChoice(values)
?
Behind the scenes, this could register an arbitrary provider with faker and use it.
The other option is to improve Faker to make it super-easy to declare a provider and use it on the fly.
Wherever possible, I'd rather we improve Faker
... Historically, I've found @fcurella to be very receptive to pull requests, etc.
Unless we absolutely have no alternative, I'm not excited about creating wrappers modifying behavior from an external library. Every time we do that, we introduce global complexity for anyone trying to use Faker
with FactoryBoy
because they now have another thing they need to understand at least well enough to evaluate whether or not it applies to their use case.
Count me in for any improvement on Faker :)
Keep in mind that originally Faker was a port of a PHP library.
As a result, some provider methods may look a little weird, non-pythonic, and possibly slower for no reason. I've been meaning to go through them and fix them, just haven't had a good chance to do it, yet.
Hey all, do you have any rough idea when this deprecation will happen? Thanks
@kwasielew No strict roadmap has been planned yet; one of the points to keep in mind is that Faker provides little tools to restrict the possible values of fields.
This means that it would be complicated to replace, for instance, the FuzzyInteger
or FuzzyDateTime
fields without losing some ability to configure the range of values they should use.
@kwasielew @rbarrois Feel free to submit issues and PRs that could help you on the Faker repo.
If you do, please include a link to this issue, so I can prioritize accordingly.
Regarding FuzzyChoice
discussed above, there's random_element
and friends. Not sure how new is the addition.
This is loosely related to the topic, but you might want to consider my use case here. Or I can create a separate issue, just tell me.
I've got models with unique fields. But I want them to receive random values. With factory.Iterator
, values are not random. With factory.fuzzy.FuzzyChoice
(as well as factory.Faker('random_element', elements=...)
), values are random but not unique. Now, what do I mean by "unique"? I want them to not repeat themselves over the course of one test. From my experience with FuzzyChoice
and factory.Faker('random_element', ...)
, they often do. Am I missing something here?
So, I came up with the following solution:
import unittest
import factory.random
class RandomUniqueElement(factory.declarations.BaseDeclaration):
instances = []
def __init__(self, elements, **kwargs):
super().__init__(**kwargs)
self.orig_elements = list(elements) # we might receive dict_keys
# or something
RandomUniqueElement.instances.append(self)
def evaluate(self, instance, step, extra):
return self.elements.pop()
def reset(self):
print('-- shuffle')
self.elements = self.orig_elements.copy()
factory.random.randgen.shuffle(self.elements)
class MyModel:
def __init__(self, *args, **kwargs):
for k, v in kwargs.items():
setattr(self, k, v)
class MyModelFactory(factory.Factory):
class Meta:
model = MyModel
a1 = RandomUniqueElement(['1', '2', '3'])
class TC1Test(unittest.TestCase):
def setUp(self):
print('-- setUp')
for i in RandomUniqueElement.instances:
i.reset()
def test_t1(self):
print('-- ' + MyModelFactory().a1)
def test_t2(self):
print('-- ' + MyModelFactory().a1)
print('-- ' + MyModelFactory().a1)
print('-- ' + MyModelFactory().a1)
unittest.main()
Can Factory Boy be improved to support such a use case?
So I don't see a way to use Faker in lazy attribute, where it's values would depend on other model fields.
When you create a new faker instance inside the lazy attribute, as you would with FuzzyDate, it does not return different values for each instance.
The concrete model:
class Employment:
user = ...
employment_start = models.DateField(
db_index=True, verbose_name=_("Start of employment")
)
probation_end = models.DateField(
db_index=True, verbose_name=_("End of probation period")
)
employment_end = models.DateField(
null=True,
db_index=True,
verbose_name=_("End of employment"),
blank=True,
)
So the "EmployedAndOutOfProbation" factory for this, would need to implement the following rules:
I don't see a way to do this with Faker, whereas with FuzzyDate it's this:
@factory.lazy_attribute
def probation_end(self):
start = self.employment_start
return fuzzy.FuzzyDate(
start_date=start + relativedelta(days=30),
end_date=start + relativedelta(months=6)
).fuzz()
Did anyone manage to do this using faker?
Never mind, just found this in the docs when working on a different project. I was too focused on using lazy_attribute :(
Nope, you can't do date arithmetic with a self attribute. I don't see a way to wrap this either. I'll keep looking...
Never mind, just found this in the docs when working on a different project. I was too focused on using lazy_attribute :(
Nope, you can't do date arithmetic with a self attribute. I don't see a way to wrap this either. I'll keep looking...
I have something working along the lines of:
log_dt = factory.Faker(
"date_time_between",
start_date=factory.SelfAttribute("..cap.created_dt"),
end_date=factory.LazyAttribute(
lambda o: o.factory_parent.cap.created_dt + timezone.timedelta(minutes=3)
),
tzinfo=timezone.utc,
)
If I remember rightly, the confusing part is having to go via factory_parent
and ..
to reach the related entry.
An equivalent for fuzzy.FuzzyChoice()
to pick a single random choice out of a list (this works for me at least): specify the length ~
my_property=factory.Faker('random_choices', elements=['a', 'b', 'c', 'd'], length=1)
As I just stumbled upon the systematic issue regarding FuzzyDateTime and the force_XXX
options: What is the current state of the general deprecations? I have to admit that its rather easy to miss the note on top of the documentation page for factory.fuzzy
if explicitly looking for one of the specific fuzzy generators, while deprecations have been considered for over six years now.
Is there any analysis of which features are actually missing from faker's providers, leaving aside options like the aforementioned force_month
which are not reliable and therefore not really worth porting? Such an overview might serve as some sort of migration guide as well.
This problem is in need of volunteers to address it. The fields can likely be ported one by one, each time following this rough procedure:
DeprecationWarning
for that fieldHere’s an example that handles FuzzyInteger
: https://github.com/FactoryBoy/factory_boy/pull/971.
We may want to reconsider the general recommendation in the documentation of Fuzzy attributes to use Faker equivalents.
For example, it's easy to think that pyfloat
would work just as well as FuzzyFloat
.
However, here's a pitfall that I encountered:
from faker import Faker
fake = Faker()
min_value = 20.445
max_value = 82.152
for _ in range(1000):
f = fake.pyfloat(right_digits=3, min_value=min_value, max_value=max_value)
assert f >= min_value, f"{f} < {min_value}"
assert f <= max_value, f"{f} > {max_value}"
you'll likely see an assertion error, for example:
Traceback (most recent call last):
File "~/min_max_pyfloat.py", line 11, in <module>
assert f >= min_value, f"{f} < {min_value}"
AssertionError: 20.108 < 20.445
If we replace pyfloat
with FuzzyFloat
, i.e., f = FuzzyFloat(low=min_value, high=max_value).fuzz()
, the above example works as expected.
@rbarrois already mentions in an earlier comment that Faker's support for restricting values is limited.
This is such a case, and Faker needs to be updated before the corresponding Fuzzy attribute can be deprecated in Factory Boy.
It's not really documented in Faker, but in pyfloat
's implementation you can see that only the integral part (left_digits
) respects the min_value
and max_value
parameters.
The decimal part is just a random integer.
It is discussed to support float values for min_value
and max_value
in issue 1068, but there has been no activity after it went stale.
I tend to be -1 on this overall topic: faker is great for generating "human" data — names, phone numbers, emails, etc.
For lower level Python structures, a few issues have popped up, showing that faking "native" values isn't Faker's focus.
The cost of maintaining factory.fuzzy
is extremely low, and it is used by lots of library users (from before the Faker era).
If it works, and the alternative isn't significantly better, I see no reason to deprecate the feature and impose a migration cost on users.
We could keep a warning at the top of the doc stating that the fuzzy
module is in maintenance-only mode, encouraging users to rely on Faker's providers instead.
However, I do not agree with deleting working code simply because an alternative exists.
😀
Unlike @rbarrois I'm all about deleting working code if a good alternative exists because it reduces the inevitable maintenance cost over time.
Re: breaking existing users code... my anecdotal experience across maintaining many libraries is that any projects still relying on this code from more than a few years ago will likely never update. So they'll never pay the migration price anyway. And for the very small handful that do, it's trivial to change the imports... plus odds are very high this is solely used in their test code, not prod code so any breakage is unlikely to have a big impact.
That said, in the 6 (!!) years since I originally opened this issue, it's definitely seemed like there's been a small but steady stream of cases where Faker
doesn't really provide an appropriate solution, and unlikely to do so in the future either. So I think that @rbarrois is spot on that completely deprecating factory.fuzzy
is now probably the wrong move.
I took a quick audit of factor.fuzzy
to see what is duplicated in faker
and at first I thought faker.providers.datetime
seemed like the only direct correlation... in which case forcing users to add an import for a single function seems rude.
However, I stumbled across https://faker.readthedocs.io/en/master/providers/faker.providers.python.html which looks like a direct match for most of the simple FuzzyX
types... FuzzyText
, FuzzyInteger
, FuzzyDecimal
, FuzzyFloat
, etc.
The ones that seem to be unique to FactoryBoy
are:
FuzzyAttribute
FuzzyChoice
BaseFuzzyAttribute
I do not feel strongly about this one, but if I had to make a choice, I'd remove the overall deprecation warning, and only deprecate the FuzzyTypes
that have direct matches in Faker. I'd probably also add a warning.warn("...", DeprecationWarning)
for a few releases, and then eventually remove it.
But again, I do not feel strongly on this, so @rbarrois and others feel more strongly, I happily defer to them.
I do see value in “There should be one-- and preferably only one --obvious way to do it.”
I read the intention of this issue as “let’s mutualize the efforts with Faker, to improve Faker where FactoryBoy is better, and leave the generation of fuzzy data to Faker”. The goal is to reduce the size (and complexity) of FactoryBoy, by letting a more specialized library handle the fuzzy generation.
That’s where my motivation to remove most of the the fuzzy module stems, and why I still believe it is a good thing.
FuzzyAttribute
main (only?) interest seems that it’s using the random seed from FactoryBoy. The developer is responsible for writing the callable that does the fuzzing. Not a huge deal to keep or drop.FuzzyChoice
looks like it could probably be replaced by https://faker.readthedocs.io/en/master/providers/baseprovider.html#faker.providers.BaseProvider.random_element.The fuzzy module also has its shortcomings, e.g. https://github.com/FactoryBoy/factory_boy/issues/969. Sure, they can be fixed. Would the efforts be better targeted to improve Faker? That’s my belief.
Would people consider removing the deprecation warning in the docs, and leave the status quo (use Fuzzy
) as the default? I believe it is quite confusing for anyone that reads that documentation page and who doesn't take the time (understandably!) to read this whole discussion thread:
Faker
isn't really meant to be used for low-level fake data generation, you have to do a lot of digging (because Faker
's docs are, rightly so, not focused on this use case!) to figure out how to do it while the Fuzzy
examples are very clearly documented in the same place.The last time (4 years ago :P) that I introduced factory_boy
into a new codebase I went down the Faker
route and I really regretted it... As people said above Faker
doesn't really cater well to the "low-level" use case that is fulfilled by Fuzzy
. I had used fuzzy
before that and it was much simpler to use for the low-level use case.
I'm now back to using Python again at work and am quite disappointed to see that this whole issue is still outstanding :P
A 3rd way would be to have Fuzzy
act as a very thin layer on top of faker
where there is potential for reusability?:
Faker
might/might not do this
Fuzzy
logic is very stable. But at least in doing so we can shield end users from this debatefaker
for the use cases where people do need higher-level fake data (with nice examples of how to bridge the two, ie calling faker
from factoryboy)Again please consider the adverse impact on users here who probably go through a lot of hurdles when it comes to generating fake data since (6 years ago!!) that messaged popped on the docs. I find it quite bad that the official docs are in this very weird "uncommitted" state for more than half a decade!!!
@Vinh-CHUC I must admit, I've read way longer threads... Anyways, can you possibly give a couple of examples? Examples where the fuzzy attributes shine, and Faker
doesn't.
@x-yuri sure anyone is able to read but it's not very user friendly... I don't think usage of the library should be reserved to people who have the patience to deep dive in 6 year old threads
again my problem is not so much about Fuzzy
vs Faker
debate in itself but rather this weird limbo situation, and the fact that the quality of the docs (and hence user experience) suffers as a result
and the fact that the quality of the docs (and hence user experience) suffers as a result
Faker is a FOSS project as well and you are always invited to contribute enhancements, for example by making it easier for users/starters to understand the docs and thus improving UX by submitting corresponding PRs.
@stefan6419846 i might give it a go, my thinking would be to rather contribute to Factory Boy's docs to clearly indicate what are the Faker equivalent rather than the other way around
I'm thinking that FactoryBoy and Faker still "target" different use cases (low-level vs high-level), it also avoids jumping between two documentation websites. E.g. if I don't need to use Faker "high-level" stuff I'd rather not have to jump to another website
@x-yuri for FuzzyAttribute
(ie use an arbitrary callable) if I'm not wrong you have to create a Faker custom provider it's a bit cumbersome in my opinion, looks like the method name has to be unique for example. The FactoryBoy version is more ephemeral and doesn't require to register something
for FuzzyAttribute (ie use an arbitrary callable) if I'm not wrong you have to create a Faker custom provider it's a bit cumbersome in my opinion, looks like the method name has to be unique for example. The FactoryBoy version is more ephemeral and doesn't require to register something
https://github.com/FactoryBoy/factory_boy/issues/271#issuecomment-1294030379 already plans on keeping FuzzyAttribute
. FuzzyChoice
might also be kept, because it’s lazy and works well with Django QuerySets.
For the remaining fuzzy attributes (FuzzyInteger
, FuzzyDecimal
, FuzzyFloat
, FuzzyDate
, FuzzyDateTime
and FuzzyNaiveDateTime
), I believe porting them to Faker should be quite reasonable. I started with FuzzyInteger a while back, and the PR was approved by 2 other contributors. I was waiting for further input, specifically from @rbarrois who seemed against the whole idea. But if none is provided, might as well proceed with the original plan and actually deprecate, then remove these attributes, to get out of limbo.
IMHO, removing the deprecation isn’t sufficient to clarify the matter, because users will get two very similar (but sometimes confusingly different) ways of generating the same data, with little guidance on which to use.
There's an important aspect to this problem that I'm not seeing anyone mentioning here, which is that the factoryboy's Faker
module does not give good suggestion on how to use it. It's signature looks like this:
class Faker(declarations.BaseDeclaration):
def __init__(self, provider, **kwargs):
...
which tells nothing about the available providers and what their accepted arguments are.
This is a big problem for casual users (like me) who just want to generate random text, numbers and choices quickly. Having to open Faker docs to find the suitable providers and learn how to use each of them every time I want to write a more advance test is not worth it.
Meanwhile, if I were using the good ole FuzzyChoice
or FuzzyText
, I'd just type the name and the IDE can tell me which arguments I should pass in right away without any hassle:
class FuzzyChoice(BaseFuzzyAttribute):
"""Handles fuzzy choice of an attribute.
Args:
choices (iterable): An iterable yielding options; will only be unrolled
on the first call.
getter (callable or None): a function to parse returned values
"""
def __init__(self, choices, getter=None):
...
As a user of this awesome library, I'd 100% vote for keeping the fuzzy classes for simple usecases and use faker for other advanced stuff.
I'd like to start a conversation around deprecating and eventually removing the Fuzzy Attributes that are already supported by Faker: http://factoryboy.readthedocs.org/en/latest/fuzzy.html
Reasoning: I'd prefer to limit the scope of FactoryBoy to handling turning input data into models, including complicated scenarios like chaining. And then delegate the fuzzing of the data as much as possible to external libraries like Faker.
Since we now wrap Faker, there's no reason for us to provide FuzzyAttributes that are already provided by Faker--it's just duplicated work. For any duplicate FuzzyAttributes that are more powerful than their Faker equivalents, where possible, I'd rather see that code moved over to Faker.
@rbarrois your thoughts?