faker-js / faker

Generate massive amounts of fake data in the browser and node.js
https://fakerjs.dev
Other
11.86k stars 877 forks source link

infra(locales): use mostly async io for generate-locales #2826

Closed ST-DDT closed 3 weeks ago

ST-DDT commented 4 weeks ago

Fixes #2804

Execution Stats limited all
next 12s 34s
async-io 10s 25s
async-io v2 9s 11s

time pnpm run generate:locales

Please note that the main perfomance boost in this PR is caused by the Promise.all cases. There might be better ways to async everything.

We could also safe up to 2s by removing the Running data normalization logs.

netlify[bot] commented 4 weeks ago

Deploy Preview for fakerjs ready!

Name Link
Latest commit c43ff8d80306cfa6258ab183804022ec32cb1b77
Latest deploy log https://app.netlify.com/sites/fakerjs/deploys/661fe71909e09900082536d1
Deploy Preview https://deploy-preview-2826.fakerjs.dev
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

ST-DDT commented 4 weeks ago

Please also post your performance statistics for comparison.

codecov[bot] commented 4 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.96%. Comparing base (462c80e) to head (c43ff8d).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## next #2826 +/- ## ======================================= Coverage 99.95% 99.96% ======================================= Files 2964 2964 Lines 212544 212544 Branches 948 951 +3 ======================================= + Hits 212442 212461 +19 + Misses 102 83 -19 ``` [see 2 files with indirect coverage changes](https://app.codecov.io/gh/faker-js/faker/pull/2826/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=faker-js)
Shinigami92 commented 4 weeks ago
next 6.44s user 0.51s system 149% cpu 4.639 total
this PR 6.41s user 0.64s system 190% cpu 3.696 total

so in total it safes a second, cause it utilizes the CPU better 👍

ST-DDT commented 4 weeks ago

And if you run it for all files?

Shinigami92 commented 4 weeks ago

And if you run it for all files?

next 17.03s user 2.44s system 139% cpu 13.914 total
this PR 12.73s user 1.70s system 181% cpu 7.961 total
matthewmayer commented 4 weeks ago

~can we skip the de-duping given that we already have a test checking for duplicates at test/locale-data.spec.ts~

xDivisionByZerox commented 4 weeks ago

can we skip the de-duping given that we already have a test checking for duplicates at test/locale-data.spec.ts

Not so sure about this right now. Maybe remove this after all modules are active? Currently, it makes creating PRs really easy, as the only thing you have to do is:

  1. Remove the module to normalize
  2. Run locale generation

I get that you could "easily" achieve this with a separate script, but this way, every person could create a PR right now.

matthewmayer commented 4 weeks ago

can we skip the de-duping given that we already have a test checking for duplicates at test/locale-data.spec.ts

Not so sure about this right now. Maybe remove this after all modules are active? Currently, it makes creating PRs really easy, as the only thing you have to do is:

  1. Remove the module to normalize
  2. Run locale generation

I get that you could "easily" achieve this with a separate script, but this way, every person could create a PR right now.

but i mean, we already know there are no existing duplicates, because if there were duplicates, the test would fail?

matthewmayer commented 4 weeks ago

can we skip the de-duping given that we already have a test checking for duplicates at test/locale-data.spec.ts

Not so sure about this right now. Maybe remove this after all modules are active? Currently, it makes creating PRs really easy, as the only thing you have to do is:

  1. Remove the module to normalize
  2. Run locale generation

I get that you could "easily" achieve this with a separate script, but this way, every person could create a PR right now.

but i mean, we already know there are no existing duplicates, because if there were duplicates, the test would fail?

removing the "new Set" for deduping doesn't seem to speed things up significantly.

matthewmayer commented 4 weeks ago

Seems like you can save a significant amount of time (~10 seconds) by returning early if the data is unchanged. This saves pretty-printing and writing out the data if there is no effect:

const localeData = normalizeDataRecursive(fileImport.default);
const original = JSON.stringify(fileImport.default);
const modified = JSON.stringify(localeData);
if (original === modified) {
    return;
}
xDivisionByZerox commented 4 weeks ago

but i mean, we already know there are no existing duplicates, because if there were duplicates, the test would fail?

You are totally right. I was thinking about something different. May bad...

ST-DDT commented 4 weeks ago

Seems like you can save a significant amount of time (~10 seconds) by returning early if the data is unchanged.

Awesome idea. Here the new results for my machine:

Execution Stats limited all
next 12s 34s
async-io 10s 25s
async-io v2 9s 11s

Please confirm the time on your devices as well.

Shinigami92 commented 4 weeks ago

Please confirm the time on your devices as well.

version user user improvement system cpu total total improvement
limited base 6.44s - 0.51s 149% 4.639 -
limited v1 6.41s 0.03s 0.64s 190% 3.696 0.943
limited v2 6.09s 0.35s 0.63s 174% 3.838 0.801
all base 17.03s - 2.44s 139% 13.914 -
all v1 12.73s 4.30s 1.70s 181% 7.961 5.953
all v2 10.94s 6.09s 1.52s 187% 6.656 7.258

tested on Mac Pro M1