Open PanderMusubi opened 4 years ago
[Copied text from @PanderMusubi]
The Nuspell command-line tool offers Unicode segmentation of text to words; see https://github.com/nuspell/nuspell/blob/master/src/nuspell/main.cxx#L283
The result is much much better [t]han simply whitespace segmentation. You can see the difference with this test https://github.com/nuspell/misc-nuspell/tree/master/segmentation
Please, offer Unicode word segmentation for command-line tool like Nuspell is doing. Only requires Boost Locale, which you already need when building with Nuspell provider.