Closed sklocher closed 3 years ago
we are working on it. We will release pre-trained checkpoints as soon as we get results comparable to what is described in the ContextNet paper. We are actively working on it with experiments still running but it turns out to be quite hard to reproduce the paper results. @titu1994 fyi.
Hi @okuchaiev, any updates on this? Thanks.
Hey @okuchaiev and @titu1994, we present our master's thesis in two days. We were interested in trying out ContextNet but we're happy with the results of QuartzNet.
However, we were wondering if you would mind sharing some of your experience with training this network. Feel free to skip questions you don't want to or can't answer:
Thanks in advance!
@sklocher , sorry I saw your question probably too late. I hope your thesis defense went well :)
"How many GPU hours did you invest so far?"
- A lot :) Do you mean specific model? On which datasets? The largest single training run we did so far for QuartzNet 15x5 was using 512 V100 GPUs for 8 hours.
"What's the best WER you achieved?"
- Again, it depends on which dataset? Checkout this paper for some numbers on some datasets/languages https://arxiv.org/abs/2005.04290
"Did you contact the authors on how they achieved these results?"
- Which authors of which results? QuartzNet and Jasper is from our team. We do talk (and are always happy to do so) with researchers from other groups/companies.
"Are you still trying?"
- YES!
"Anything else you would like to share?"
- Transfer learning works really well! Starting from good pre-trained model (or even just encoder, if your target vocabulary differs) will get you better results. Especially if you don't have enough training data and/or compute
Hi there! I stumbled upon the very promising ContextNet which you already implemented in NeMo.
Before I start to train the model by myself, I was wondering how soon we can expect NVIDIA to upload pretrained models to this net?
Thanks in advance!