AbiWord / enchant

enchant spellchecking library
https://abiword.github.io/enchant/
GNU Lesser General Public License v2.1
331 stars 55 forks source link

Unicode word segmentation for command-line tool #244

Open PanderMusubi opened 4 years ago

PanderMusubi commented 4 years ago

Please, offer Unicode word segmentation for command-line tool like Nuspell is doing. Only requires Boost Locale, which you already need when building with Nuspell provider.

rrthomas commented 4 years ago

[Copied text from @PanderMusubi]

The Nuspell command-line tool offers Unicode segmentation of text to words; see https://github.com/nuspell/nuspell/blob/master/src/nuspell/main.cxx#L283

The result is much much better [t]han simply whitespace segmentation. You can see the difference with this test https://github.com/nuspell/misc-nuspell/tree/master/segmentation