RangerWolf / language-detection

Automatically exported from code.google.com/p/language-detection
0 stars 0 forks source link

Detector returns different results when called multiple times with the same input #50

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I'm trying to use the language-detection library to distinguish between English 
and German for very short snippets of text. I noticed that sometimes when I use 
the language detector repeatedly on the same piece of input then I get 
different results for each trial (the very first is usually right). 

Am I using the API in a wrong way? Is there anything I can do to always get 
deterministic results? 

I attached a little unit test that reproduces the problem if you set the right 
PROFILE_DIR.

What steps will reproduce the problem?
1. Load profiles, create a detector, append input, detect.
2. Create new detector, append same input again, detect.
3. Repeat this a couple of times. 

What is the expected output? What do you see instead?
I expect to constantly get the same result for the same input. 

What version of the product are you using? On what operating system?

<groupId>org.apache.solr</groupId>
<artifactId>solr-langdetect</artifactId>
<version>3.5.0</version>

on Windows 7

Thanks in advance!
Heike

Original issue reported on code.google.com by heiks...@googlemail.com on 28 Jan 2013 at 4:28

Attachments:

GoogleCodeExporter commented 9 years ago
langdetect randomly samples for variance reduction.
See FAQ. It says how to obtain deterministic result.

https://code.google.com/p/language-detection/wiki/FrequentlyAskedQuestion

Original comment by nakatani.shuyo on 24 Jul 2013 at 9:12