Sample data lines for Turkish or English

aalto-speech / morfessor

Morfessor is a tool for unsupervised and semi-supervised morphological segmentation

BSD 2-Clause "Simplified" License

180 stars 27 forks source link

Have you noticed Morfessor FlatCat https://github.com/aalto-speech/flatcat ? It may be more suitable for your needs, if you want to distinguish between stems and suffixes.

Some Turkish data is available from http://morpho.aalto.fi/events/morphochallenge2010/datasets.shtml .

Note that the words have been lowercased, and mapped onto latin characters by replacing the letters specific to the Turkish language are replaced by capital letters. You may need to do some transformations on your data before training or use.

aalto-speech / morfessor

Sample data lines for Turkish or English #7