MZKMXCV / topic-modeling-tool

Automatically exported from code.google.com/p/topic-modeling-tool
0 stars 0 forks source link

Documents requiring mallet's token-regex option cannot be read due to lack of token-regex parameter in TMT #8

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Input documents in a non-English script, e.g. Greek.
2. Run TMT

What is the expected output? What do you see instead?

Mallet doesn't understand where a token starts or stops, so output it just 
gibberish. I expect the words to be recognised as they are.

What version of the product are you using? On what operating system?

TMT 1.0 on Mac OS 10.9

Please provide any additional information below.

This is easily fixed by adding a token-regex input field in "Advanced options" 
which is handed down to mallet.

Original issue reported on code.google.com by philipp....@googlemail.com on 10 Dec 2013 at 11:10