feedbackmine / language_detector

ruby language detection library using n-gram
http://HelpdeskOnTwitter.com
121 stars 68 forks source link

Better detection of Portuguese vs Spanish #3

Open timhaines opened 12 years ago

timhaines commented 12 years ago

I use this library to detect the language of Tweets. It's not perfect, but does pretty well given it only has 140 characters to work with.

My Spanish users complain that quite often Portuguese tweets get identified as Spanish Tweets. One of them said that the letter ç only belongs to Portuguese. ã and ê may also be used to further differentiate.