NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.71k stars 2.45k forks source link

ContextNet Pretrained Models #863

Closed sklocher closed 3 years ago

sklocher commented 4 years ago

Hi there! I stumbled upon the very promising ContextNet which you already implemented in NeMo.

Before I start to train the model by myself, I was wondering how soon we can expect NVIDIA to upload pretrained models to this net?

Thanks in advance!

okuchaiev commented 4 years ago

we are working on it. We will release pre-trained checkpoints as soon as we get results comparable to what is described in the ContextNet paper. We are actively working on it with experiments still running but it turns out to be quite hard to reproduce the paper results. @titu1994 fyi.

alexsounder commented 4 years ago

Hi @okuchaiev, any updates on this? Thanks.

sklocher commented 4 years ago

Hey @okuchaiev and @titu1994, we present our master's thesis in two days. We were interested in trying out ContextNet but we're happy with the results of QuartzNet.

However, we were wondering if you would mind sharing some of your experience with training this network. Feel free to skip questions you don't want to or can't answer:

Thanks in advance!

okuchaiev commented 4 years ago

@sklocher , sorry I saw your question probably too late. I hope your thesis defense went well :)

"How many GPU hours did you invest so far?"

  • A lot :) Do you mean specific model? On which datasets? The largest single training run we did so far for QuartzNet 15x5 was using 512 V100 GPUs for 8 hours.

"What's the best WER you achieved?"

"Did you contact the authors on how they achieved these results?"

  • Which authors of which results? QuartzNet and Jasper is from our team. We do talk (and are always happy to do so) with researchers from other groups/companies.

"Are you still trying?"

  • YES!

"Anything else you would like to share?"

  • Transfer learning works really well! Starting from good pre-trained model (or even just encoder, if your target vocabulary differs) will get you better results. Especially if you don't have enough training data and/or compute