Closed vadimkantorov closed 4 years ago
What was the source data for the http://files.deeppavlov.ai/lang_models/ru_wiyalen_no_punkt.arpa.binary.gz language model? What KenLm arguments were used for the estimation?
Thank you!
Hi @vadimkantorov, The 3-gram model was trained on russian wikipedia and a closed news dataset. All punctuation characters were removed beforehand.
What was the source data for the http://files.deeppavlov.ai/lang_models/ru_wiyalen_no_punkt.arpa.binary.gz language model? What KenLm arguments were used for the estimation?
Thank you!