What steps will reproduce the problem?
1. Run TMT with texts in UTF-8 which have words that have characters with
accents, like "é" or "à". For example texts in French.
What is the expected output? What do you see instead?
- I would expect the topic words to include words that have an accented letter.
Instead, the topic words will not include these, but include words cut off at
those characters with accents instead, so "privé" becomes "priv" or "était"
becomes "tait" or "prêt" becomes "pr" (without the final "t").
What version of the product are you using? On what operating system?
- I'm using the latest version of TMT on Ubuntu 13.10.
- Note that the procedure works just fine when I use Mallet directly.
Original issue reported on code.google.com by C.Schoech@gmail.com on 9 Dec 2013 at 4:53
Original issue reported on code.google.com by
C.Schoech@gmail.com
on 9 Dec 2013 at 4:53