codeaudit / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

New language detector based on token n-grams. #129

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
We already have a language detector (TextCat) based on character n-grams.
It works well, but not for very short texts.

For such texts, a token n-gram based approach might be better.

I am going to work on a version based on the Google Web1T data.

Original issue reported on code.google.com by torsten....@gmail.com on 8 Apr 2013 at 6:34

GoogleCodeExporter commented 9 years ago
See also: https://issues.apache.org/jira/browse/UIMA-2801

Original comment by richard.eckart on 8 Apr 2013 at 6:45

GoogleCodeExporter commented 9 years ago
Module has been added to core.

Original comment by torsten....@gmail.com on 4 Sep 2013 at 12:22