NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.84k stars 2.46k forks source link

Is there a Jupyter tutorial notebook for BERT ASR postprocessing? #1380

Closed catskillsresearch closed 3 years ago

catskillsresearch commented 3 years ago

BERT ASR postprocessing is described here: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/v0.10.1/nlp/asr-improvement.html

Is there a notebook in the GitHub to demo this in combination with transfer learning of QuartzNet15x5 for a new language?

It would be very helpful to show how to transfer-learn the Bert for a transfer-learned new language QuartzNet.

Also...what happened to all the files referenced in the docs, for example examples/nlp/asr_postprocessor

After some digging, I found them buried in an old version. I hope this doesn't mean this doesn't work any more:

https://github.com/NVIDIA/NeMo/tree/fa68d336cfefbbbb1849ff1b6ef454149f45234d/examples/nlp/asr_postprocessor

catskillsresearch commented 3 years ago

Actually never mind, I can just train using other text-to-text notebooks as a new language pair, the pair being ASR output to ground truth.

ShantanuNair commented 3 years ago

Hey, they've explained in other issues - https://github.com/NVIDIA/NeMo/issues/1156, https://github.com/NVIDIA/NeMo/issues/1126 that it hasn't been ported yet, but looks like they have some workarounds to adding a Language Model for incorporating context. They might want to consider adding it to their docs since it seems to pop up often.

Actually never mind, I can just train using other text-to-text notebooks as a new language pair, the pair being ASR output to ground truth.

Also, would you mind elaborating a bit on your process? Did it help with results in your case?

catskillsresearch commented 3 years ago

Hi @ShantanuNair, they had this script in older releases, they just deleted it without explanation.

I will elaborate, with this caveat: It didn't help.

The method was just to take the predicted output from my NeMO Quartz15x15 trained ASR net, pair it with the gold translation, and train a new net. The new net was this Seq2Seq model from Ben Trevett: https://github.com/bentrevett/pytorch-seq2seq/blob/master/6%20-%20Attention%20is%20All%20You%20Need.ipynb

I suppose I could have done the equivalent model in NeMo but they deleted the example. For the NIST exercise I was doing, I needed to train from scratch, and the (deleted) example only showed starting from a pretrained model. I already knew the Ben Trevett code so I went with that.

The score after adding this "afterburner" net trained on the mistakes of the ASR net was only negligeably higher than without it.