CLD2Owners / cld2

Compact Language Detector 2
Apache License 2.0
843 stars 128 forks source link

How to interpret the Score value and Reliable flag. #54

Open loretoparisi opened 7 years ago

loretoparisi commented 7 years ago

Given that for a certain text the CLD2 response has a reliable flag and a detection object like:

{
    reliable: true
    detection : { name: 'ENGLISH', code: 'en', percent: 99, score: 1228 }
}

the detection can be considered reliable according to this flag i.e. detection.reliable=true. Which is the range of the detection.scorevalue and which is the exact meaning? I would like to set a min, max interval for the score value based on the flag, so havingreliable=truewhich is thedetection.scorevalues range? and which is that range forreliable=false` value?

Thanks.

loretoparisi commented 5 years ago

Just questioning here about score and percent and a probability accuracy value...

Natalie-Caruana commented 2 years ago

Hi @loretoparisi I just came across this issue. Did you manage to find a solution?

loretoparisi commented 2 years ago

Hello I don't remember exactly, but I did something else related to this

https://stackoverflow.com/questions/55186435/converting-language-detection-score-of-cld2-to-cld3-accuracy

And also

https://stats.stackexchange.com/questions/248557/language-detection-with-cld2-with-mixed-inputs-in-long-documents

Natalie-Caruana commented 2 years ago

Thanks for your prompt reply @loretoparisi . Will check these!

Natalie-Caruana commented 2 years ago

Hi, I tried to look into these links but with not much luck. So the lower the value of the score, the better the prediction, but do you know what would be the best to flag prediction as reliable/not reliable as otherwise i'm finding it hard to get a useful meaning of this score. I'm using the python package pycld2.