Open GrimPixel opened 5 months ago
Thanks very much, this is how things really should be done. Lute should just be for reading, the segmentation/tokenization should be handled outside of it. The link you gave is very useful, appreciated.
The problem I run into is how to do a "plugin architecture" (#116 ), as different users/languages will have different requirements. That may be very easy to do, but it may also be brutal! I don't have a good handle on it yet.
The plugin architecture issue of #116 is now done. So if anyone wants to hack on adding new parsers/segmenters, there are notes in the wiki. :-)
Is your feature request related to a problem? Please describe.
There is only a MeCab.
Describe the solution you'd like
Add support for those mentioned at https://polyglotclub.com/wiki/Language/Multiple-languages/Culture/Text-Processing-Tools#Word_Segmentation.