FactoryBoy / factory_boy

A test fixtures replacement for Python
https://factoryboy.readthedocs.io/
MIT License
3.52k stars 397 forks source link

Consider removing the faker dependency #632

Open charlax opened 5 years ago

charlax commented 5 years ago

The problem

faker uses 7.6M of disk as of writing. For users who aren't using its features, this is a pretty heavy cost.

It includes a generator for license plates, SSN, ISBN, etc...

Proposed solution

Consider removing the faker dependency, allowing users to plug it if they need it.

This solution would also allow users to control how the faker factory is used and specify its locale, for instance.

federicobond commented 5 years ago

This looks definitely doable, as the code that uses faker is quite contained. Would you be able to take a look at it and submit a patch?

francoisfreitag commented 5 years ago

Issue #271 suggests depending on faker more to eventually remove the fuzzy module from the project. Removing faker goes in the opposite direction.

Maybe faker could provide extra dependencies for less common generators? Roughly, specifying some faker modules like extra dependencies.

francoisfreitag commented 5 years ago

Looking at faker modules size, it looks like localization takes a lot of disk space. Perhaps locales could be specified as extra dependencies, so that general-purpose fields remain a requirement (I have FuzzyInteger, FuzzyFloat, FuzzyDecimal, FuzzyDate, etc. in mind) but locales are extra dependencies. One would then pip install Faker[fr_FR] to get Faker to generate French data.

du --total --human-readable --max-depth=3 --separate-dirs faker
60K faker/utils
12K faker/providers/bank/it_IT
12K faker/providers/bank/pl_PL
12K faker/providers/bank/fr_FR
12K faker/providers/bank/nl_NL
12K faker/providers/bank/de_DE
12K faker/providers/bank/en_GB
12K faker/providers/bank/de_AT
12K faker/providers/bank/no_NO
12K faker/providers/bank
8.0K    faker/providers/credit_card/en_US
20K faker/providers/credit_card
12K faker/providers/automotive/sv_SE
12K faker/providers/automotive/en_US
12K faker/providers/automotive/ar_JO
12K faker/providers/automotive/pl_PL
12K faker/providers/automotive/pt_BR
12K faker/providers/automotive/en_CA
12K faker/providers/automotive/hu_HU
12K faker/providers/automotive/ar_SA
16K faker/providers/automotive/de_DE
16K faker/providers/automotive/ru_RU
12K faker/providers/automotive/en_GB
12K faker/providers/automotive/en_NZ
12K faker/providers/automotive/id_ID
12K faker/providers/automotive/ar_PS
12K faker/providers/automotive
12K faker/providers/internet/uk_UA
12K faker/providers/internet/sv_SE
12K faker/providers/internet/it_IT
12K faker/providers/internet/en_US
12K faker/providers/internet/fr_CH
12K faker/providers/internet/pt_PT
12K faker/providers/internet/zh_CN
12K faker/providers/internet/el_GR
12K faker/providers/internet/pl_PL
12K faker/providers/internet/fr_FR
12K faker/providers/internet/cs_CZ
12K faker/providers/internet/sl_SI
12K faker/providers/internet/pt_BR
12K faker/providers/internet/fi_FI
12K faker/providers/internet/zh_TW
12K faker/providers/internet/ar_AA
12K faker/providers/internet/hu_HU
12K faker/providers/internet/ja_JP
12K faker/providers/internet/bg_BG
12K faker/providers/internet/en_AU
12K faker/providers/internet/fa_IR
12K faker/providers/internet/bs_BA
12K faker/providers/internet/ko_KR
12K faker/providers/internet/de_DE
12K faker/providers/internet/ru_RU
12K faker/providers/internet/sk_SK
12K faker/providers/internet/en_NZ
12K faker/providers/internet/id_ID
12K faker/providers/internet/hr_HR
12K faker/providers/internet/de_AT
12K faker/providers/internet/no_NO
40K faker/providers/internet
20K faker/providers/job/uk_UA
12K faker/providers/job/en_US
88K faker/providers/job/fr_CH
64K faker/providers/job/zh_CN
20K faker/providers/job/pl_PL
56K faker/providers/job/fr_FR
48K faker/providers/job/pt_BR
20K faker/providers/job/fi_FI
36K faker/providers/job/zh_TW
16K faker/providers/job/ar_AA
44K faker/providers/job/hu_HU
12K faker/providers/job/fa_IR
348K    faker/providers/job/bs_BA
40K faker/providers/job/ko_KR
44K faker/providers/job/ru_RU
36K faker/providers/job/hy_AM
12K faker/providers/job/th_TH
28K faker/providers/job/hr_HR
64K faker/providers/job
8.0K    faker/providers/file/en_US
24K faker/providers/file
8.0K    faker/providers/currency/en_US
28K faker/providers/currency
28K faker/providers/address/uk_UA
24K faker/providers/address/sv_SE
28K faker/providers/address/it_IT
32K faker/providers/address/en_US
28K faker/providers/address/fr_CH
16K faker/providers/address/es
40K faker/providers/address/pt_PT
28K faker/providers/address/zh_CN
380K    faker/providers/address/el_GR
24K faker/providers/address/de
40K faker/providers/address/pl_PL
36K faker/providers/address/fr_FR
68K faker/providers/address/cs_CZ
104K    faker/providers/address/sl_SI
56K faker/providers/address/pt_BR
44K faker/providers/address/fi_FI
24K faker/providers/address/zh_TW
24K faker/providers/address/en_CA
32K faker/providers/address/hu_HU
48K faker/providers/address/ja_JP
140K    faker/providers/address/ka_GE
24K faker/providers/address/en
20K faker/providers/address/es_MX
140K    faker/providers/address/nl_BE
64K faker/providers/address/ne_NP
124K    faker/providers/address/nl_NL
24K faker/providers/address/en_AU
28K faker/providers/address/fa_IR
16K faker/providers/address/es_ES
40K faker/providers/address/ko_KR
28K faker/providers/address/de_DE
88K faker/providers/address/ru_RU
260K    faker/providers/address/sk_SK
56K faker/providers/address/hy_AM
24K faker/providers/address/en_GB
52K faker/providers/address/he_IL
24K faker/providers/address/en_NZ
32K faker/providers/address/id_ID
36K faker/providers/address/hr_HR
24K faker/providers/address/de_AT
12K faker/providers/address/no_NO
28K faker/providers/address/hi_IN
16K faker/providers/address
8.0K    faker/providers/user_agent/en_US
20K faker/providers/user_agent
12K faker/providers/phone_number/uk_UA
12K faker/providers/phone_number/sv_SE
12K faker/providers/phone_number/it_IT
12K faker/providers/phone_number/en_US
12K faker/providers/phone_number/fr_CH
12K faker/providers/phone_number/tr_TR
12K faker/providers/phone_number/pt_PT
12K faker/providers/phone_number/zh_CN
12K faker/providers/phone_number/ar_JO
12K faker/providers/phone_number/el_GR
12K faker/providers/phone_number/pl_PL
12K faker/providers/phone_number/fr_FR
12K faker/providers/phone_number/cs_CZ
12K faker/providers/phone_number/sl_SI
12K faker/providers/phone_number/pt_BR
12K faker/providers/phone_number/fi_FI
12K faker/providers/phone_number/zh_TW
12K faker/providers/phone_number/dk_DK
12K faker/providers/phone_number/en_CA
12K faker/providers/phone_number/tw_GH
12K faker/providers/phone_number/hu_HU
12K faker/providers/phone_number/ja_JP
12K faker/providers/phone_number/bg_BG
12K faker/providers/phone_number/es_MX
12K faker/providers/phone_number/nl_BE
12K faker/providers/phone_number/ne_NP
12K faker/providers/phone_number/nl_NL
12K faker/providers/phone_number/en_AU
12K faker/providers/phone_number/fa_IR
12K faker/providers/phone_number/bs_BA
12K faker/providers/phone_number/es_ES
12K faker/providers/phone_number/ko_KR
12K faker/providers/phone_number/de_DE
12K faker/providers/phone_number/ru_RU
12K faker/providers/phone_number/lv_LV
12K faker/providers/phone_number/sk_SK
12K faker/providers/phone_number/hy_AM
12K faker/providers/phone_number/th_TH
24K faker/providers/phone_number/en_GB
12K faker/providers/phone_number/he_IL
12K faker/providers/phone_number/en_NZ
12K faker/providers/phone_number/lt_LT
12K faker/providers/phone_number/id_ID
12K faker/providers/phone_number/hr_HR
12K faker/providers/phone_number/no_NO
12K faker/providers/phone_number/hi_IN
16K faker/providers/phone_number/ar_PS
12K faker/providers/phone_number
8.0K    faker/providers/python/en_US
20K faker/providers/python
12K faker/providers/ssn/uk_UA
12K faker/providers/ssn/sv_SE
12K faker/providers/ssn/it_IT
20K faker/providers/ssn/en_US
12K faker/providers/ssn/fr_CH
12K faker/providers/ssn/pt_PT
96K faker/providers/ssn/zh_CN
12K faker/providers/ssn/el_GR
12K faker/providers/ssn/pl_PL
12K faker/providers/ssn/mt_MT
12K faker/providers/ssn/fr_FR
12K faker/providers/ssn/cs_CZ
12K faker/providers/ssn/sl_SI
12K faker/providers/ssn/pt_BR
12K faker/providers/ssn/fi_FI
12K faker/providers/ssn/zh_TW
12K faker/providers/ssn/dk_DK
12K faker/providers/ssn/es_CA
12K faker/providers/ssn/et_EE
12K faker/providers/ssn/lb_LU
12K faker/providers/ssn/en_CA
16K faker/providers/ssn/hu_HU
12K faker/providers/ssn/bg_BG
12K faker/providers/ssn/en_IE
12K faker/providers/ssn/nl_BE
12K faker/providers/ssn/nl_NL
16K faker/providers/ssn/es_ES
12K faker/providers/ssn/ko_KR
12K faker/providers/ssn/de_DE
12K faker/providers/ssn/ro_RO
12K faker/providers/ssn/ru_RU
12K faker/providers/ssn/lv_LV
12K faker/providers/ssn/sk_SK
12K faker/providers/ssn/el_CY
12K faker/providers/ssn/en_GB
12K faker/providers/ssn/he_IL
12K faker/providers/ssn/lt_LT
12K faker/providers/ssn/hr_HR
12K faker/providers/ssn/de_AT
12K faker/providers/ssn/de_CH
12K faker/providers/ssn/no_NO
12K faker/providers/ssn
12K faker/providers/date_time/en_US
12K faker/providers/date_time/ar_EG
12K faker/providers/date_time/pl_PL
12K faker/providers/date_time/fr_FR
12K faker/providers/date_time/sl_SI
100K    faker/providers/date_time/ar_AA
12K faker/providers/date_time/hu_HU
12K faker/providers/date_time/ko_KR
12K faker/providers/date_time/ru_RU
12K faker/providers/date_time/hy_AM
12K faker/providers/date_time/id_ID
12K faker/providers/date_time/hr_HR
140K    faker/providers/date_time
48K faker/providers/lorem/en_US
24K faker/providers/lorem/zh_CN
28K faker/providers/lorem/el_GR
80K faker/providers/lorem/pl_PL
68K faker/providers/lorem/fr_FR
24K faker/providers/lorem/zh_TW
40K faker/providers/lorem/ar_AA
20K faker/providers/lorem/ja_JP
36K faker/providers/lorem/ru_RU
16K faker/providers/lorem/la
20K faker/providers/lorem/hy_AM
16K faker/providers/lorem/he_IL
20K faker/providers/lorem
12K faker/providers/company/sv_SE
28K faker/providers/company/it_IT
12K faker/providers/company/en_US
12K faker/providers/company/fr_CH
12K faker/providers/company/pt_PT
12K faker/providers/company/zh_CN
16K faker/providers/company/pl_PL
16K faker/providers/company/fr_FR
12K faker/providers/company/cs_CZ
12K faker/providers/company/sl_SI
16K faker/providers/company/pt_BR
12K faker/providers/company/fi_FI
16K faker/providers/company/zh_TW
12K faker/providers/company/hu_HU
12K faker/providers/company/ja_JP
12K faker/providers/company/bg_BG
32K faker/providers/company/es_MX
36K faker/providers/company/nl_NL
104K    faker/providers/company/fa_IR
32K faker/providers/company/ko_KR
12K faker/providers/company/de_DE
12K faker/providers/company/ru_RU
12K faker/providers/company/sk_SK
32K faker/providers/company/hy_AM
12K faker/providers/company/id_ID
12K faker/providers/company/hr_HR
12K faker/providers/company/no_NO
36K faker/providers/company
12K faker/providers/isbn/en_US
28K faker/providers/isbn
12K faker/providers/geo/en_US
12K faker/providers/geo/el_GR
12K faker/providers/geo/de_AT
188K    faker/providers/geo
8.0K    faker/providers/profile/en_US
12K faker/providers/profile
8.0K    faker/providers/barcode/en_US
12K faker/providers/barcode
52K faker/providers/person/uk_UA
56K faker/providers/person/sv_SE
24K faker/providers/person/it_IT
152K    faker/providers/person/en_US
20K faker/providers/person/fr_CH
68K faker/providers/person/tr_TR
20K faker/providers/person/pt_PT
44K faker/providers/person/zh_CN
156K    faker/providers/person/el_GR
164K    faker/providers/person/pl_PL
36K faker/providers/person/fr_FR
24K faker/providers/person/cs_CZ
20K faker/providers/person/sl_SI
24K faker/providers/person/pt_BR
76K faker/providers/person/fi_FI
44K faker/providers/person/zh_TW
28K faker/providers/person/dk_DK
12K faker/providers/person/es_CA
60K faker/providers/person/ar_AA
36K faker/providers/person/et_EE
32K faker/providers/person/tw_GH
44K faker/providers/person/hu_HU
24K faker/providers/person/ja_JP
68K faker/providers/person/ka_GE
100K    faker/providers/person/bg_BG
92K faker/providers/person/en
44K faker/providers/person/es_MX
12K faker/providers/person/ar_SA
92K faker/providers/person/ne_NP
72K faker/providers/person/nl_NL
24K faker/providers/person/fa_IR
56K faker/providers/person/es_ES
20K faker/providers/person/ko_KR
96K faker/providers/person/de_DE
36K faker/providers/person/ro_RO
80K faker/providers/person/ru_RU
24K faker/providers/person/lv_LV
76K faker/providers/person/hy_AM
72K faker/providers/person/th_TH
20K faker/providers/person/en_TH
56K faker/providers/person/en_GB
120K    faker/providers/person/he_IL
96K faker/providers/person/en_NZ
16K faker/providers/person/lt_LT
44K faker/providers/person/id_ID
36K faker/providers/person/hr_HR
16K faker/providers/person/de_AT
84K faker/providers/person/de_CH
24K faker/providers/person/no_NO
20K faker/providers/person/hi_IN
12K faker/providers/person/ar_PS
16K faker/providers/person
36K faker/providers/color/uk_UA
12K faker/providers/color/en_US
24K faker/providers/color/fr_FR
32K faker/providers/color/pt_BR
12K faker/providers/color/hu_HU
16K faker/providers/color/ru_RU
28K faker/providers/color/hy_AM
24K faker/providers/color/hr_HR
24K faker/providers/color/ar_PS
24K faker/providers/color
8.0K    faker/providers/misc/en_US
16K faker/providers/misc
40K faker/providers
88K faker
11M total
charlax commented 5 years ago

This is another option, but it's more work and more complicated than just having faker as an optional dependency that the user has to setup. I don't plan to use faker at all, I prefer simple fuzzy data generation, so I would still not use it...

francoisfreitag commented 5 years ago

Fuzzy data generation will be removed at some point in the future. That’s part of the reason why Faker became a required dependency, towards the goal of removing the fuzzy module entirely and replacing it with the more powerful Faker. That is stated at the top of the fuzzy module documentation. Making Faker optional is taking a step back from that direction, because it encourages users to rely on the factory.fuzzy module and not install Faker.

I’m not thrilled by this change and am currently -0 on making it. I would like input from @jeffwidman or @rbarrois before to proceed in that direction.

francoisfreitag commented 5 years ago

Accidental closure.

charlax commented 5 years ago

Sorry, what I meant by fuzzy is: writing those fuzzy generators myself.

I understand your point of view. I think there is a lot of value in decoupling factories from "data generators" (fuzzy module, faker package) completely so that the user can choose how they want the data generated. Let's say in the future a better package than faker emerges, then the user should be free to use it. Composing those two unrelated concerns using an interface is just a more powerful design IMO. It would also allow users to customize the faker instance (setting its locale, etc.).

But mainly, 7.6M is just too big a package if some of your users don't use it.

francoisfreitag commented 5 years ago

Good point. Making Faker an optional dependency allows users not to installing it when they don’t need it. I imagined that users would instead rely on fuzzy, but they may simply not need any fuzziness at all.

That works for me :+1:.

prescod commented 5 years ago

I agree that Faker is overkill for my project's needs. 5 different files of English names? Plus every other language? I suspect that only a small percentage of users need more than a predictable 20% of the functionality.

Isn't "Multi-language Lorem" almost a contradiction in terms?

rbarrois commented 5 years ago

I'll try to sum up the current situation:

What is the actual issue here? factory_boy is intended as a development tool. It's obviously better to reduce the size when possible, but are those 11MB a problem in development platforms?

If we want to reduce the space used by factory_boy, I see 2 options:

Currently, factory_boy has a hard dependency on faker; making that optional would require every user of factory_boy+faker to change from factory_boy to factory_boy[faker] in their dependencies... which I'd rather avoid. Another option would be to publish two releases (factory_boy & factory_boy_minimal)?

All that is quite a lot of work for the project team and possibly for end users.

charlax commented 5 years ago

I would argue that removing faker is a strictly better design, applying the single responsibility principle. Factoryboy already has a lot of value without using faker at all. This would add a lot of flexibility in choosing how fake values are generated.

But sure, this would definitely be a breaking change. I think this is achievable using setuptools extras, and would only require users to change the package they install without changing any code.

On Thu 1 Aug 2019 at 11:57, Raphaël Barrois notifications@github.com wrote:

I'll try to sum up the current situation:

  • Faker takes (currently) 11MB of disk space
  • For comparison, an empty virtualenv is already 14MB in size (due to the copies of pip and setuptools there).

What is the actual issue here? factory_boy is intended as a development tool. It's obviously better to reduce the size when possible, but are those 11MB a problem in development platforms?

If we want to reduce the space used by factory_boy, I see 2 options:

  • Find a way to reduce faker's disk space (for instance by changing the way its provider data is stored, and maybe compressing it; or by making part of the contents optional);
  • Provide a way not to install faker alongside factory_boy.

Currently, factory_boy has a hard dependency on faker; making that optional would require every user of factory_boy+faker to change from factory_boy to factory_boy[faker] in their dependencies... which I'd rather avoid. Another option would be to publish two releases (factory_boy & factory_boy_minimal)?

All that is quite a lot of work for the project team and possibly for end users.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/FactoryBoy/factory_boy/issues/632?email_source=notifications&email_token=AAA5NNPG6XCYGBKG67I52J3QCKXSDA5CNFSM4H3FWEDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3KBOCA#issuecomment-517216008, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA5NNMVUTIRMVBHW6UQ7PTQCKXSDANCNFSM4H3FWEDA .

jeffwidman commented 5 years ago

I also would strongly prefer to avoid anything like factory_boy & factory_boy_minimal or using extras etc... that's a headache for everyone unless we really need it.

So for me, I see it as either completely drop faker or stick with it...

The file size is a non-issue IMO... as @rbarrois noted this is a dev library, so having an extra ~11MB is trivial.

I do understand the rationale for single responsibility, and in fact I initially leaned that way myself several years ago. But having used it, I think the developer ergonomics are much better with the dependency included... because otherwise we force those users who want deep integration between Faker with factory_boy to rewire all these things together. I myself used to do this before we included Faker, and it was annoying.

In fact, including faker is arguably moving toward a single-responsibility because it makes it easier to drop all the fuzzy stuff that was painful to maintain/extend.

Right now, we are essentially in a "batteries included" + "usage of batteries is optional" world so that those who want to hand-roll their own custom stuff can do that, and those who just want to use some syntactic sugar and not rewire stuff have that option as well. We are not forcing anyone to use these batteries, the only forced thing is the download, which as noted above is trivial for a development-focused library.

So I'm afraid I don't see the point of doing this.

federicobond commented 5 years ago

Note for the sake of discussion that not including Faker and having to rewire all these things together are two different things.

I did a prototype of this change where the fuzzy generators used Faker if it was available or just threw an exception with a message suggesting to install the dependency if not. Unfortunately, I lost it when I accidentally removed my local factory_boy clone.