cometkim / unicode-segmenter

A lightweight and fast, pure JavaScript library for Unicode segmentation
MIT License
37 stars 0 forks source link

Word/Sentence segmenter #25

Open cometkim opened 2 months ago

cometkim commented 2 months ago

I expected the Intl.Segmenter to behave based on the provided locale parameter and some dictionary per it. However there seems to be no change in behavior depending on the locale, and only the basic algorithm specified in Unicode is implemented.

So there's no reason why I couldn't completely polyfill it. It still has a chance to be useful for Korean, unlike other CJK, it has strict word spacing.