Mimino666 / langdetect

Port of Google's language-detection library to Python.
Other
1.71k stars 196 forks source link

Model file format description #47

Open ftyers opened 6 years ago

ftyers commented 6 years ago

I have a question about the file format, it's basically a JSON encoded thing where there is a map of n-gram frequencies (1,2,3) in "freq", then there is a language code in "name". But what is "n_words" ? I guess it's number of words in the training corpus, but what are the three values ?