cometkim / unicode-segmenter

A lightweight and fast, pure JavaScript library for Unicode segmentation
MIT License
37 stars 0 forks source link

perf(grapheme): a micro-optimization makes getting faster #24

Closed cometkim closed 2 months ago

cometkim commented 2 months ago

Skips one redundant comparison per loop and escapes earlier.

This removes a @ts-ignore-ed path too!

changeset-bot[bot] commented 2 months ago

🦋 Changeset detected

Latest commit: 319266fc4eac8b87127e1d1f534567b5285b1708

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package | Name | Type | | ----------------- | ----- | | unicode-segmenter | Minor |

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (3ea5a2d) to head (319266f).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #24 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 14 15 +1 Lines 1284 1330 +46 ========================================= + Hits 1284 1330 +46 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

cometkim commented 2 months ago

https://github.com/cometkim/unicode-segmenter/pull/24/commits/3b0870a96245c9ab962c9ce62d1c1a097e727136 makes ~7% of perf improvement (404ns -> 377ns)

before (3 times) ``` benchmark time (avg) (min … max) p75 p99 p999 --------------------------------------------------------- ----------------------------- unicode-segmenter 424 ns/iter (404 ns … 988 ns) 424 ns 596 ns 988 ns Intl.Segmenter 2'363 ns/iter (1'580 ns … 2'937 ns) 2'615 ns 2'910 ns 2'937 ns graphemer 2'668 ns/iter (2'586 ns … 2'919 ns) 2'693 ns 2'879 ns 2'919 ns grapheme-splitter 4'757 ns/iter (4'208 ns … 460 µs) 4'459 ns 6'209 ns 54'500 ns summary unicode-segmenter 5.58x faster than Intl.Segmenter 6.3x faster than graphemer 11.23x faster than grapheme-splitter benchmark time (avg) (min … max) p75 p99 p999 --------------------------------------------------------- ----------------------------- unicode-segmenter 432 ns/iter (404 ns … 1'088 ns) 432 ns 704 ns 1'088 ns Intl.Segmenter 2'398 ns/iter (1'559 ns … 3'119 ns) 2'616 ns 3'040 ns 3'119 ns graphemer 2'620 ns/iter (2'579 ns … 3'038 ns) 2'610 ns 3'037 ns 3'038 ns grapheme-splitter 4'686 ns/iter (4'208 ns … 347 µs) 4'375 ns 6'084 ns 54'083 ns summary unicode-segmenter 5.55x faster than Intl.Segmenter 6.06x faster than graphemer 10.84x faster than grapheme-splitter benchmark time (avg) (min … max) p75 p99 p999 --------------------------------------------------------- ----------------------------- unicode-segmenter 416 ns/iter (405 ns … 958 ns) 424 ns 455 ns 958 ns Intl.Segmenter 2'371 ns/iter (1'537 ns … 3'022 ns) 2'621 ns 2'941 ns 3'022 ns graphemer 2'602 ns/iter (2'585 ns … 2'717 ns) 2'608 ns 2'661 ns 2'717 ns grapheme-splitter 4'672 ns/iter (4'167 ns … 397 µs) 4'375 ns 5'958 ns 54'167 ns summary unicode-segmenter 5.7x faster than Intl.Segmenter 6.25x faster than graphemer 11.23x faster than grapheme-splitter ```
after (3 times) ``` benchmark time (avg) (min … max) p75 p99 p999 --------------------------------------------------------- ----------------------------- unicode-segmenter 388 ns/iter (377 ns … 921 ns) 395 ns 435 ns 921 ns Intl.Segmenter 2'363 ns/iter (1'562 ns … 3'091 ns) 2'638 ns 3'022 ns 3'091 ns graphemer 2'610 ns/iter (2'582 ns … 2'846 ns) 2'613 ns 2'743 ns 2'846 ns grapheme-splitter 4'633 ns/iter (4'208 ns … 280 µs) 4'334 ns 5'791 ns 52'208 ns summary unicode-segmenter 6.08x faster than Intl.Segmenter 6.72x faster than graphemer 11.93x faster than grapheme-splitter benchmark time (avg) (min … max) p75 p99 p999 --------------------------------------------------------- ----------------------------- unicode-segmenter 396 ns/iter (377 ns … 950 ns) 398 ns 504 ns 950 ns Intl.Segmenter 2'422 ns/iter (1'561 ns … 3'246 ns) 2'692 ns 3'216 ns 3'246 ns graphemer 2'620 ns/iter (2'586 ns … 2'819 ns) 2'620 ns 2'754 ns 2'819 ns grapheme-splitter 4'693 ns/iter (4'208 ns … 566 µs) 4'458 ns 5'750 ns 55'333 ns summary unicode-segmenter 6.12x faster than Intl.Segmenter 6.62x faster than graphemer 11.85x faster than grapheme-splitter benchmark time (avg) (min … max) p75 p99 p999 --------------------------------------------------------- ----------------------------- unicode-segmenter 407 ns/iter (379 ns … 912 ns) 415 ns 538 ns 912 ns Intl.Segmenter 2'591 ns/iter (1'612 ns … 3'286 ns) 2'935 ns 3'272 ns 3'286 ns graphemer 2'714 ns/iter (2'594 ns … 3'199 ns) 2'755 ns 3'137 ns 3'199 ns grapheme-splitter 4'720 ns/iter (4'208 ns … 266 µs) 4'334 ns 6'959 ns 57'000 ns summary unicode-segmenter 6.36x faster than Intl.Segmenter 6.66x faster than graphemer 11.58x faster than grapheme-splitter ```
cometkim commented 2 months ago

https://github.com/cometkim/unicode-segmenter/pull/24/commits/55cbc0cb1a158e13f3706659412474bd56d7c63a also made ~4% improvement (404ns -> 387ns)

benchmark 3 times ``` benchmark time (avg) (min … max) p75 p99 p999 --------------------------------------------------------- ----------------------------- unicode-segmenter 407 ns/iter (388 ns … 1'238 ns) 409 ns 508 ns 1'238 ns Intl.Segmenter 2'388 ns/iter (1'567 ns … 3'072 ns) 2'652 ns 2'958 ns 3'072 ns graphemer 2'619 ns/iter (2'596 ns … 2'893 ns) 2'625 ns 2'706 ns 2'893 ns grapheme-splitter 4'673 ns/iter (4'167 ns … 263 µs) 4'375 ns 6'042 ns 52'166 ns summary unicode-segmenter 5.87x faster than Intl.Segmenter 6.44x faster than graphemer 11.49x faster than grapheme-splitter benchmark time (avg) (min … max) p75 p99 p999 --------------------------------------------------------- ----------------------------- unicode-segmenter 407 ns/iter (387 ns … 1'230 ns) 408 ns 521 ns 1'230 ns Intl.Segmenter 2'357 ns/iter (1'505 ns … 2'987 ns) 2'595 ns 2'947 ns 2'987 ns graphemer 2'619 ns/iter (2'597 ns … 2'707 ns) 2'626 ns 2'695 ns 2'707 ns grapheme-splitter 4'717 ns/iter (4'167 ns … 309 µs) 4'417 ns 6'250 ns 53'959 ns summary unicode-segmenter 5.79x faster than Intl.Segmenter 6.44x faster than graphemer 11.6x faster than grapheme-splitter benchmark time (avg) (min … max) p75 p99 p999 --------------------------------------------------------- ----------------------------- unicode-segmenter 414 ns/iter (389 ns … 1'360 ns) 416 ns 529 ns 1'360 ns Intl.Segmenter 2'624 ns/iter (1'615 ns … 3'377 ns) 2'973 ns 3'374 ns 3'377 ns graphemer 2'714 ns/iter (2'598 ns … 3'110 ns) 2'747 ns 3'105 ns 3'110 ns grapheme-splitter 4'740 ns/iter (4'208 ns … 859 µs) 4'458 ns 5'458 ns 56'500 ns summary unicode-segmenter 6.34x faster than Intl.Segmenter 6.56x faster than graphemer 11.46x faster than grapheme-splitter ```
cometkim commented 2 months ago

Total, it will be ~13% faster (404ns -> 356ns) in the right next release! So it's now 7 times faster than graphemer, 6.5 times faster than Intl.Segmenter.

benchmark 3 times ``` benchmark time (avg) (min … max) p75 p99 p999 --------------------------------------------------------- ----------------------------- unicode-segmenter 377 ns/iter (358 ns … 1'176 ns) 379 ns 487 ns 1'176 ns Intl.Segmenter 2'496 ns/iter (1'580 ns … 3'496 ns) 2'791 ns 3'421 ns 3'496 ns graphemer 2'653 ns/iter (2'583 ns … 2'949 ns) 2'696 ns 2'926 ns 2'949 ns grapheme-splitter 4'815 ns/iter (4'208 ns … 330 µs) 4'500 ns 5'750 ns 62'292 ns summary unicode-segmenter 6.63x faster than Intl.Segmenter 7.05x faster than graphemer 12.79x faster than grapheme-splitter benchmark time (avg) (min … max) p75 p99 p999 --------------------------------------------------------- ----------------------------- unicode-segmenter 371 ns/iter (354 ns … 1'097 ns) 376 ns 428 ns 1'097 ns Intl.Segmenter 2'386 ns/iter (1'516 ns … 3'185 ns) 2'631 ns 3'097 ns 3'185 ns graphemer 2'610 ns/iter (2'585 ns … 2'819 ns) 2'615 ns 2'730 ns 2'819 ns grapheme-splitter 4'722 ns/iter (4'208 ns … 247 µs) 4'458 ns 5'625 ns 52'750 ns summary unicode-segmenter 6.43x faster than Intl.Segmenter 7.04x faster than graphemer 12.73x faster than grapheme-splitter benchmark time (avg) (min … max) p75 p99 p999 --------------------------------------------------------- ----------------------------- unicode-segmenter 375 ns/iter (356 ns … 1'106 ns) 377 ns 472 ns 1'106 ns Intl.Segmenter 2'393 ns/iter (1'567 ns … 2'849 ns) 2'640 ns 2'842 ns 2'849 ns graphemer 2'617 ns/iter (2'586 ns … 2'800 ns) 2'618 ns 2'764 ns 2'800 ns grapheme-splitter 4'671 ns/iter (4'208 ns … 268 µs) 4'375 ns 5'208 ns 56'166 ns summary unicode-segmenter 6.39x faster than Intl.Segmenter 6.99x faster than graphemer 12.47x faster than grapheme-splitter ```
cometkim commented 2 months ago

https://github.com/cometkim/unicode-segmenter/pull/24/commits/7b5a9bc6f5681529737ccd6968ff411117034ebc

here's another 3ns 😁

cometkim commented 2 months ago

https://github.com/cometkim/unicode-segmenter/pull/24/commits/7590438979dddd0ed5a073ead6b9f74d816d27e3 made significant

cometkim commented 2 months ago

updated benchmark, a bit verbose but way more realistic