NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.83k stars 2.46k forks source link

KenLM with ASR #3221

Closed dungnguyen98 closed 2 years ago

dungnguyen98 commented 2 years ago

Hi, I'm fine tuning Quartznet (character model) and CTC-Conformer (BPE model) on new language and the results are good. Then I use KenLM with both. I have some question:

  1. When I download and see "3-gram.pruned.1e-7.arpa" file in this tutorial, I recognize that this file for word level, but Quartznet is character level, but It work well. Can you explain how it work ?
  2. I train KenLM for CTC-Conformer (BPE model) according to asr language model tutorial, and the training was successful. But when I apply it with beamsearch to CTC-Conformer, it gives me irrelevant results(wer upto 40%), while using greedy search wer ~ 9%. I don't know what makes it not work. Can you give me some advice? Thank you!
VahidooX commented 2 years ago
  1. This LM ("3-gram.pruned.1e-7.arpa") is trained for char-level models and does not work with BPE models. KenLM just supports word-level LM. The beam search decoder we use handles this. It gets a word-level KenLM model and perform the decoding by considering this into account.

2.Have about first trying that script with decoding_mode=beamsearch or greedy to make sure everything else is OK. Have you played with parameters alpha and beta? Has encoding_level in train_kenlm.py set to "subword" when you run it?

leminhnguyen commented 5 months ago

hi @dungnguyen98 have you successfully train the kenlm with BPE ?