KarinaBunyik / Twitter_hidden_topics

Finding those twitter topics that do not appear in another news media
3 stars 1 forks source link

Add bigramms #31

Open KarinaBunyik opened 10 years ago

KarinaBunyik commented 10 years ago

Add bigramms to Mallet. There is an error when bigramms added: the Swedish characters are not recognized.

KarinaBunyik commented 10 years ago

The problem with my previous attempt was that mallet takes the delimiter of n-gramms as '_' by default. I should set it to ' '(space) when I want bigrams.

KarinaBunyik commented 10 years ago

Bigrams don't seem to work on unicode character. So I suppose only english text is ok with bigrams. Ask Dimitri if he tried it.