StephenNi / language-detection

Automatically exported from code.google.com/p/language-detection
0 stars 0 forks source link

Detector Enhancement #14

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Hi, I think it would be very nifty if Detector had a method like  
getLangsWithProbs  that would return HashMap with languages and their 
probabilities... For the developer to be able to decide whether he accepts such 
a probability...

All that is needed is another  sortProbability method that returns map instead 
of list ....

The thing is, that if you get a text in a language that has not profile or it 
is some gibberish, it easily satisfies PROB_THRESHOLD and developer that is 
using this library doesn't have a chance to see the probability at all... More 
over the PROB_THRESHOLD is private final and cannot be set.

King Regards, Jakub

Original issue reported on code.google.com by liska.jakub on 27 Apr 2011 at 1:09

GoogleCodeExporter commented 8 years ago
Detector has getProbabilities() method that returns sorted languages list with 
their probabilities.
http://code.google.com/p/language-detection/wiki/Tutorial

Is it you wish, isn't it?

Original comment by nakatani.shuyo on 28 Apr 2011 at 8:37

GoogleCodeExporter commented 8 years ago
Calculated probabilities (from updateLangProb method) are in prob vector, then 
they are normalized. In the end you get a lot of prob=0 (because of an order of 
magnitude) and some other (sum gives 100).

So you couldn't get probabilities for all languages(ok, you can but value is 
0), but only for those which value is higest and an order of magnitude is 
similar.

If you want see these values before normalization, look closer to 
updateLangProb.

Original comment by markowsk...@gmail.com on 28 Apr 2011 at 12:25

GoogleCodeExporter commented 8 years ago
I'm sorry, I overlooked that getProbabilities() returns  List<Languages> ... I 
thought it was List<String> ....   Thank you guys 

Original comment by liska.jakub on 28 Apr 2011 at 12:32