eslint / eslint

Find and fix problems in your JavaScript code.
https://eslint.org
MIT License
24.39k stars 4.4k forks source link

perf: new grapheme library #18359

Closed cometkim closed 1 month ago

cometkim commented 1 month ago

Replace graphemer library to unicode-segmenter library which is ligher and faster See benchmark

Prerequisites checklist

What is the purpose of this pull request? (put an "X" next to an item)

[ ] Documentation update [ ] Bug fix (template) [ ] New rule (template) [ ] Changes an existing rule (template) [ ] Add autofix to a rule [ ] Add a CLI option [X] Add something to the core [ ] Other, please explain:

What changes did you make? (Give an overview)

The Unicode grapheme splitter library that the ESLint core currently relies on was last updated two years ago and does not include the Unicode 15.1.0 specification.

I created an alternative, the unicode-segmenter library, for a lighter bundle and better performance.

unicode-segmeneter is based on the latest (v15.1.0) Unicode data, is smaller than graphemer, and has over 6x faster count performance.

Specification compliance is tested through fast-check property assertion with V8's built-in Intl.Segmenter which is based on ICU.

Is there anything you'd like reviewers to focus on?

netlify[bot] commented 1 month ago

Deploy Preview for docs-eslint canceled.

Name Link
Latest commit 524dfc55e7b98a0288cc3d86761467043b07fe3c
Latest deploy log https://app.netlify.com/sites/docs-eslint/deploys/6620a70bb39388000866b2a5
fasttime commented 1 month ago

Hi @cometkim, thanks for the PR! Could you explain if there are also any functional differences between unicode-segmenter and graphemer? Specifically with regard to innovations in Unicode 15.1, would unicode-segmenter report different results for certain characters?

You may also be interested in joining our discussion about replacing graphemer with Intl.Segmenter in #17835.

nzakas commented 1 month ago

@cometkim thanks. I'm afraid we can't switch to a dependency that has only existed for six days. ESLint is downloaded 120 million times a month and we take which dependencies we use very seriously. While the benchmark performance is impressive, the project itself is so new that for security and stability reasons it can't be included in ESLint at this time.

cometkim commented 1 month ago

Ok, that is true. It was written seriously for my production, but it was definitely the result of only a few days. The graphemer (and grapheme-splitter) has just been there for a long time.

If you are considering a really better alternative in perf, size, and security perspective, please visit the repository and try fuzzing :)

cometkim commented 1 month ago

Specifically with regard to innovations in Unicode 15.1, would unicode-segmenter report different results for certain characters?

The Unicode data libraries mentioned (14.0 vs 15.1) won't make an actual difference when dealing with graphemes. There was an actual difference in the updated boundary check rule GB9c, I implemented that in v0.6.0