CLD2Owners / cld2

Compact Language Detector 2
Apache License 2.0
844 stars 128 forks source link

Add possibility to set MinReliableKeepPercent #36

Open ghost opened 9 years ago

ghost commented 9 years ago

Originally reported on Google Code with ID 36

What steps will reproduce the problem?
1. try to detect the language of attached input file
2. see the output is "unknown"

What is the expected output? What do you see instead?
I would expect either 'perssian' or 'arabic'

What version of the product are you using? On what operating system?
rev195 on centos 7

Please provide any additional information below.

CLD2 returns "unknown" because the reliability is lower than kMinReliableKeepPercent
(in compact_lang_det_impl.cc) :
static const int kMinReliableKeepPercent = 41;  // Remove lang if reli < this

Would adding an additional parameter to the DetectLanguageXXX(...) in order to set
this threshold be acceptable ?

Regards

Reported by William.Tambellini on 2015-06-11 17:07:38


ghost commented 9 years ago
That's a good suggestion. I'd really like to see us consider an alternative scheme where
we use the builder pattern to construct a settings/config object, so that we can keep
the API as stable as possible while accommodating reasonable requests for behavioral
changes like this.

Jason/Dick, what do you think?

Reported by andrewhayden@google.com on 2015-06-11 20:17:48

jasonriesa commented 9 years ago

We are revising CLD2 internally to have a single entry point that takes an options proto. I see no reason why kMinReliableKeepPercent cannot be included as a configurable option. Once that is done and tested thoroughly, we will migrate those changes to the open source version of CLD2 here.