Closed ahkarami closed 2 years ago
@titu1994 any suggestions here?
You may use the following script to use pretrained models of HF as a neural rescorer with ASR models:
You can find more info here in the docs:
This script does not support the use of MLM based models like BERT as they are not efficient to be used as LM. You may find more detail on why here in these discussion threads:
https://github.com/NVIDIA/NeMo/discussions/2572
https://github.com/NVIDIA/NeMo/discussions/2572
Suggest to try auto-regressive models like gpt2 or transfo-xl-wt103 instead of BERT.
Thank you very much for your complete explanation.
I have just 2 another questions:
1- Can one use LSTM-based language models (e.g., AWD-LSTM or ULMFiT) for ASR language modeling in Nemo?
2- I think, generally, for a specific domain, transfo-xl-wt103
LM has the best accuracy (in comparison of N-gram LMs and GPT2). Am I correct? and also in the view point of generality
aspect, which one is better? I mean, for example, if one want to prepare a semi-general ASR model with LM that has an appropriate accuracy in some domains, which one is better?
Best
1- Can one use LSTM-based language models (e.g., AWD-LSTM or ULMFiT) for ASR language modeling in Nemo? You can use any LM which is capable of estimating the likelihood of a sentence. But you may need to update the eval_neural_rescorer.py to call/use them properly.
2- I think, generally, for a specific domain, transfo-xl-wt103 LM has the best accuracy (in comparison of N-gram LMs and GPT2). Am I correct? and also in the view point of generality aspect, which one is better? I mean, for example, if one want to prepare a semi-general ASR model with LM that has an appropriate accuracy in some domains, which one is better?
We have a pretrained Transformer (GPT style) trained for English already here: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/asrlm_en_transformer_large_ls . This model can be better for ASR comparing to transfo-xl-wt103. It depends on your domain and evaluation set.
Rescoring can be slightly better than N-gram but not necessarily. The best result is achieved when they are both used together. I suggest to start with an N-gram LM as it very fast and easy to train and they do not increase your inference time significantly. If you want to use rescoring, you need to perform beam search decoding anyway and adding N-gram to beam search decoding would not increase the inference time significantly. In our experiments on LS, pretrained models like transfo-xl-wt103 were worse than our pretrained model trained on language modelling LS text corpus. LM Models trained on general text may not show very promising results when evaluated on a specific domain.
Dear @VahidooX , Thank you for your complete answers. Best
Hi, I have a question. How one can add a Word Level Pre-Trained Language Model (e.g., BERT or DistilBERT from HuggingFace) to an ASR model (e.g., character based like QuartzNet or Token based like Citrinet) for inference time? Best