Closed ahmetax closed 7 years ago
Have you noticed Morfessor FlatCat https://github.com/aalto-speech/flatcat ? It may be more suitable for your needs, if you want to distinguish between stems and suffixes.
Some Turkish data is available from http://morpho.aalto.fi/events/morphochallenge2010/datasets.shtml .
Note that the words have been lowercased, and mapped onto latin characters by replacing the letters specific to the Turkish language are replaced by capital letters. You may need to do some transformations on your data before training or use.
I want to use Morfessor to separate Turkish words into stem+suffixes. I don't have a sample database. So, I must create a new data set for training. Can you give me some explanatory example data lines in Turkish, or English that should be in the data set? Thanks.