I have a question about the file format, it's basically a JSON encoded thing where there is a map of n-gram frequencies (1,2,3) in "freq", then there is a language code in "name". But what is
"n_words" ? I guess it's number of words in the training corpus, but what are the three values ?
I have a question about the file format, it's basically a JSON encoded thing where there is a map of n-gram frequencies (1,2,3) in
"freq"
, then there is a language code in"name"
. But what is"n_words"
? I guess it's number of words in the training corpus, but what are the three values ?