andyperlitch / language-detection

Automatically exported from code.google.com/p/language-detection
0 stars 0 forks source link

Wrong detection on ES/RO, DK/NO #33

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

,in our test job,we are evaluating your tool. we are testers checking 
translated contend in software for nearly 30 languages. 
1. when we are trying to identify "Ingen ArcSync?-konto?Log ind Opret en konto" 
, it thought NO is the most possible language(>99.99%) ,other than DK. 
2.  when identifying "Accepar" ,it determine RO(>99.99%) ,other than ES . May 
detection probability be improved next version?? Thanks your work.

What is the expected output? What do you see instead?
Can identify DK/NO,RO/ES more correctly.

What version of the product are you using? On what operating system?
 langdetect-09-13-2011.zip

Please provide any additional information below.
We don't get newer profile from WikiPedia because we don't know where to get 
these abstract database file on Download page. we have googled using 'eswiki- 
-abstract.xml' OR 'Wikipedia abstract database files',but no result. 
And on Polyglot3000 ,which is also a good language detection tool, it can 
return correct judgement, though it only provide GUI on Windows and no API or 
source code.

Original issue reported on code.google.com by Preload...@gmail.com on 21 Jan 2012 at 2:59

GoogleCodeExporter commented 9 years ago
langdetect is a language detection library for enough long text and is poor at 
short text detection. In particular, one word detection is almost incorrect in 
our way.
I don't know Polyglot3000 how to recognize, but I guess they have very huge 
dictionaries which can't store on memory. That is not in my approch.

I'm researching a short text detection in parallel, but that can't also detect 
one word's language probably...

The distribution page of Wikipedia abstract is noted here.
http://code.google.com/p/language-detection/wiki/Tools

Original comment by nakatani.shuyo on 25 Jan 2012 at 2:57