NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.84k stars 2.46k forks source link

[Question] Is it possible to implement BERT in ASR? #1733

Closed lodm94 closed 3 years ago

lodm94 commented 3 years ago

Hi all, i am workin on a quartznet model with 5-gram beam search LM for decode the CTC matrix. I was wondering if it could be possible to implement a BERT-based LM on top of my asr model.

If i simply mask the uncorrect transcripted word and then run BERT, of course the word would be replaced with a new one, but not necessarily the correct one. BERT could replace a mispelled word with a totally different word that fits right in the context. I guess BERT should be fed with audio waveform information to replace the word with the correct one instead with a different one.

Is it possible? Any suggestion on related work?

okuchaiev commented 3 years ago

Doing this for every single step in beam search process would be too computationally expensive. Instead you can re-score candidate beams from beam search like we did with transformer XL in this paper: https://arxiv.org/pdf/1904.03288.pdf