faker-js / faker

Generate massive amounts of fake data in the browser and node.js
https://fakerjs.dev
Other
12.74k stars 914 forks source link

Proposal: Design for standalone module functions #2667

Open ST-DDT opened 8 months ago

ST-DDT commented 8 months ago

Clear and concise description of the problem

Currently the faker methods are organized in modules and these are joined to the whole Faker class. Most methods in the modules reference a function in a different module, by invoking it via the embedded Faker reference. This makes it near impossible for building tools to drop unused parts of code package.

Changing the implementation to use standalone methods, that only call other standalone methods, mostly solves this issue. Old code/syntax (e.g. faker.person.firstName) must remain valid.

Suggested solution

1. Extract the functions from the module classes into their own files (the same applies to internal helper methods)

- modules
  - airline
    - airline.ts
    - airport.ts
    - ...
  - helpers
    - arrayElement.ts
    - arrayElements.ts
    - multiple.ts
    - ...
  - ..

2. Change the signature of the methods, so that it replaces it the reference to the main Faker instance e.g. by passing it as argument.

function firstName(fakerCore: FakerCore, ...remainingArgs) {
  return arrayElement(fakerCore, fakerCore.definitions.person.firstName);
}

fakerCore β‰ˆ> Faker without any modules (only definitions, randomizer, config)

3. Call the standalone functions from the modules.

export function newPersonModule(fakerCore: FakerCore): PersonModule {
  return ({
    firstName: (..args) => firstName(fakerCore, ...args);
    ...
  });
}

4. Optionally, "process" the function to make it easier to use with the same/a fakerCore.

export function fakerize<T extends (fakerCore: FakerCore, ...args: any[]) => any>( fn: T ): FakerizedMethod<T> {
  const fakerized = fn as FakerizedMethod<T>;
  fakerized.withCore = (fakerCore: FakerCore) => (...args: FakerizedParameters<T>) => fn(fakerCore, ...args);
  // Potentially more methods
  return fakerized;
}

// We need to check whether this negatively impacts readability of the type if not inlined
type FakerizedMethod<T extends () => any> = T & Readonly<{
    withCore: ( fakerCore: FakerCore ) => (...args: FakerizedParameters<T>) => ReturnType<T>;
  }>;

type FakerizedParameters<T extends (...args: any[]) => any> =
  Parameters<T> extends [FakerCore, ...infer U] ? U : never;

Usage:

// in firstName.ts
export const firstName = fakerize(firstName);

// by users or maybe pre-built per locale
export const enFirstName = firstName.withCore(fakerCoreEN);

export function newPersonModule(fakerCore: FakerCore): PersonModule {
  return ({
    firstName: firstName.withCore(fakerCore);
    ...
  });
}

// by users
enFirstName() // Henry
Usage nano bound standalone module function ````ts // in locales/en.ts const nanoCoreFirstName: FakerCore = { randomizer, config, locale: { person: { first_name }}}; // Contains only the required locale data for firstName export const enFirstName = firstName.withCore(nanoCoreFirstName); // by users enFirstName() // Henry ```` The nano bound standalone module function contain only the data that are needed for that specific function and nothing else. So they are tiny after treeshaking 1-10KB instead of the full 600KB for en.

Pros:

Cons:

grafik vs grafik

Alternative

I also considered making the fakerCore parameter the last parameter and optional by defaulting to e.g. fakerCoreEN.

However that has a few disadvantages:

// This method can be used by multiple languages
function mockPerson(fakerCore?: FakerCore, static?: Partial<Person> = {}): Person {
  return ({
    id: static.id ?? uuid(fakerCore),
    firstName: static.firstName ?? firstName(),
    lastName: static.lastName ?? lastName(),
  });
}
// Did you spot the missing parameters?
// Would you if I didn't tell you?
// If it was more complicated?
// Or if this method is only a nested function within the `mockLeasingAgreement()` function?
firstName(undefined (SexType), fakerCoreFR)
export function fakerize<T extends (...args: any[], fakerCore: FakerCore) => any>(
    fn: T,
    fakerCoreParameterIndex: number // <-- probably not compile time guarded => hard to use by our users
    ): FakerizedMethod<T> {

  const fakerized = fn as FakerizedMethod<T>;
  fakerized.withCore = (fakerCore: FakerCore) => (...args: FakerizedParameters<T>) => {
    args[fakerCoreParameterIndex] = fakerCore;
    fn(...args);
  }
  return fakerized;
}
Functions with pass-through parameters might result in easier to read code, but that belongs into a different proposal ````ts function multiple(source: (...args: A[], fakerCore) => T, options, fakerCore: FakerCore, ...args: A[]) { // Impossible } multiple(firstName, { count: 5}, fakerCoreEN, 'male'); ````

Additional context

This is an attempt at detailing the proposal made in #1250

It was made with a change like #2664 in mind to retain optimal performance.

This proposal will not remove or deprecate the well known faker.person.firstName syntax.


Adopting this proposal in combination with #2664 has side effects for methods that accept other methods as parameters, as they have to pass on the fakerCore instance. ````ts function multiple(fakerCore: FakerCore, source: (fakerCore: FakerCore) => T, options) { cosnt derived = fakerCore.derive(); return Array.from({length: options.count}, () => source(derived)); } ```` If the user does not use the derived instance, then they use more seeds from their main instance and might even use different fakerCores for the multiple function and the source function. ````ts // BAD multiple(fakerCoreEN, () => firstName(fakerCoreFR), { count: 3 }); // GOOD multiple(fakerCoreEN, (core) => firstName(core), { count: 3 }); // BETTER? multiple(fakerCoreEN, firstName, { count: 3 }); ```` This also affects other functions such as `uniqueArray` and `unique`. Aka that extends the scope of faker to some (helper) functions that call faker functions indirectly.
Note: I cut quite a few related/derived proposals from this issue entirely, because it is long enough as is: - Nano-Localized-Functions for min bundle sizes - Most of the Fakerize Meta-Framework - Code-Transformer - Most of the extension framework - Implementation details - generateComplex(fakerCore, { key: keyof T: (fakerCore) => T[key] }): T - ... I just though about these while writing the ticket, thus it does not imply that I actually plan on adding any of these. (I likely forgot a few as well, because I only wrote them down at the endπŸ˜‰.)

Usage Example

const usersEN = mockUsers(fakerCoreEN);
const usersFR = mockUsers(fakerCoreFR);

function mockUsers(fakerCore: FakerCore) {
  return multiple(fakerCore, mockUser, { count: 3 });
}

function mockUser(fakerCore: FakerCore): User{
  return ({
    id: uuid(fakerCore),
    firstName: firstName(fakerCore),
    lastName: lastName(fakerCore),
  });
}
matthewmayer commented 8 months ago

Sorry I don't totally understand if this is a internal change or would affect consumers.

If I had existing code like

const { faker } = require("@faker-js/faker")
const email = faker.internet.email()

Would that need to change?

ST-DDT commented 8 months ago

Would that need to change?

Old code does not need to change

Could/as proposed:

const { fakerCoreEN, email } = require("@faker-js/faker")
const email = email(fakerCoreEN)
// This is a few KB smaller after bundling than the old code (~620KB? vs ~600KB?)

At full "capacity" (probably not part of the first implementation):

const { email } = require("@faker-js/faker/locale/en")
const email = email()
// This could be tiny (~5KB?)
matthewmayer commented 8 months ago

Provided it doesn't need to change that's fine. Anything which would require consumers to change the syntax of every call to faker they are currently making in existing code would be a big con.

But if this is only for new code, or if you want to take advantage of tree shaking, that's OK.

matthewmayer commented 8 months ago

i think its also important to make sure that tree-shaking actually works after making this change!

This is going to require a LOT of changes to the code; we don't want to spend a long time doing that but then find out that there is some unrelated reason that causes tree-shaking to not be possible with this architecture. Maybe we could start with a proof of concept that this actually allows for tree-shaking to work before committing to changing this, maybe just for a single module or method?

Shinigami92 commented 8 months ago

At first: thanks very much for taking the effort and proposing this draft via a GitHub issue πŸ™

I already know a little bit more behind the scenes via private conversations and faker meetings, but I also want to write this down here to be more clear and referenceable in the future:
I'm 1_000% with @matthewmayer and want to be able to stick to the current usage of faker and so this proposal / implementation is just an addition / opt-in if wanted to benefit from. Also that means that there will no breaking changes introduced by this implementation and all current tests (except of the deprecation removals for sure) will still work afterwards. However, I might be fine if we need something as a workaround then to support tree-shaking like import { email } from '@faker-js/faker/tree-shaked'.


Furthermore, like @matthewmayer I would really love to see a working implementation that results in <<< 600 KiB like you proposed here, before actually rewriting the whole src code base.

At full "capacity" (probably not part of the first implementation):

const { email } = require("@faker-js/faker/locale/en")
const email = email()
// This could be tiny (~5KB?)

And one last wish: I would prefer if we could do all the deprecation removals first, so we have more clean test files and also less to convert.

Tracking PR is mostly here (it got slightly more a tracking issue right now πŸ˜…)

ST-DDT commented 8 months ago
ST-DDT commented 8 months ago

Team Decision

ST-DDT commented 8 months ago

Blocked by #2628

matthewmayer commented 7 months ago

const { fakerCoreEN, email } = require("@faker-js/faker") const email = email(fakerCoreEN) // This is a few KB smaller after bundling than the old code (~620KB? vs ~600KB?)

Regarding the new syntax for this what happens to the module names? (Ie "internet" for email?)

ST-DDT commented 7 months ago

I dont have a final idea for this, but I thought about omitting it in the export names unless required for disambiguation.

Assigning the aliases/export names might be a manual task.

Shinigami92 commented 7 months ago

It would be absolutely cool if it is somehow possible to call these functions without passing a specific faker instance, and it falls back to a default faker instance. But not sure yet if that's possible.


I would be in favor for sampleString() because this reads a bit more naturally πŸ€” But I have just put 2 seconds of brain power into that πŸ˜…

ST-DDT commented 7 months ago

It would be absolutely cool if it is somehow possible to call these functions without passing a specific faker instance, and it falls back to a default faker instance. But not sure yet if that's possible.

I thought about exposing all methods bound to a specific locale (as minimal dataset) e.g. from the locale/en folder.

Note: I cut quite a few related/derived proposals from this issue entirely, because it is long enough as is:

  • Nano-Localized-Functions for min bundle sizes

I consider that off-topic for the discussion, as the topic is already big enough as is.

Shinigami92 commented 7 months ago

It would be absolutely cool if it is somehow possible to call these functions without passing a specific faker instance, and it falls back to a default faker instance.

But not sure yet if that's possible.

I thought about exposing all methods bound to a specific locale (as minimal dataset) e.g. from the locale/en folder.

Oh, I absolutely forgot 🀦, many of the functions do not need a localized faker instance at all and will just work more or less out of the box. So my request/idea was more related to just these non-locale functions. The localized functions can / should depend on a requested faker instance anyway.

Anyway, if something is confusing, I can also just shut up and do vacation if you want πŸ˜‚

ST-DDT commented 7 months ago

many of the functions do not need a localized faker instance

But they still need the randomizer from it. We can discuss where to put them, after we have the capability to actually do so.

ST-DDT commented 6 months ago

So here my plan to implement this feature.

  1. Introduce FakerCore the container that stores all mutable state/data.
    1. PR that moves randomizer, locale data and "config" to its own thing
    2. Issue/PR with considerations regarding behavioral! changes (should the pre-built instances share the config/randomizer with each other), can be done in parallel with 2
  2. Actually change the methods to be standalone/take the FakerCore as param (instead of using it as field).

Is that OK for you?

matthewmayer commented 6 months ago

As this is a big change and will likely need several preparatory pull requests before it gets to the point where you can actually use individual tree-shakable functions, perhaps it would make sense to merge these into a dedicated branch rather than merge directly into next? That way if there are unanticipated problems along the way, there is the freedom to postpone all the changes and release them together in a later release than 9.0.