NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.14k stars 2.52k forks source link

I have a speech dataset. I want to use pretrained quartznet with language model #1175

Closed gopesh97 closed 4 years ago

gopesh97 commented 4 years ago

Describe your question I have a speech dataset. I want to use pre-trained quartznet with a language model, most preferably transformer-XL or kenLM as suggested in the blogpost.

quartznet = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En") transcripts = quartznet.transcribe(paths2audio_files=files)

I want to know what code changes do I need to make integrate language model in prediction and how to train a corresponding Transformer XL for using with quartznet.

Environment overview (please complete the following information)

Environment details

Additional context Example: Nvidia- rtx 2080

okuchaiev commented 4 years ago

currently we aren't planning to port transfomer-xl LM rescoring. However, we do plan to work on adding this kind of Transformer LM https://github.com/NVIDIA/NeMo/issues/1126

Closing as duplicate feature request

gopesh97 commented 4 years ago

Can't I use any language model?