Mimino666 / langdetect

Port of Google's language-detection library to Python.
Other
1.72k stars 198 forks source link

Changing the prior map #43

Open mparada opened 7 years ago

mparada commented 7 years ago

I have a data set of (very) short texts in one of the main three Swiss languages (German, French and Italian). I already removed all profiles apart from those three from langdetect/profiles. Still I get this:

from langdetect import detect_langs
detect_langs("Motorrad")  # should return 'de'

[it:0.9999975589209744]

I am trying to fix this by adding a prior map that resembles the Swiss population: prior_map = {'de':0.65, 'fr':0.25, 'it':0.1}, which would skew the behavior in the right direction.

I saw that the detector class has a set_prior_map method but I am not able to either instantiate it not to set the prior for all detectors. Any idea?