NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 369 forks source link

Jasper models #305

Closed bmwshop closed 5 years ago

bmwshop commented 5 years ago

We are seeing really interestng results with Jasper models on noisy data. Could you possibly comment on the impact of dataset augmentation with generated speech and also on the increasing of the depth of the model? e.g. if we double the data set, does it improve WER for the same model or does it allow us to train a deeper model without an overfit?

borisgin commented 5 years ago

Do you mean if we add 2x of syntetic data to natural data set? This can hurt your testing Wer , eg when syntetic set is generated with only one voice, so model Will overfit this speaker

bmwshop commented 5 years ago

I meant just one generated voice, so (original dataset + generated) == 2 * size of original. We are training the largest Jasper model on a combination of Fisher (2,000 hours ) + libri (1,000 hours) == 3,000 hours. So the question is [a] should we augment - effectively double the size of the data set to 6,000 hours and if so, [b] should we make the Jasper model deeper?

GabrielLin commented 5 years ago

Hi @bmwshop , could you please share your training time and what GPU you used for training jasper. It costs me about 8 days for training w2l_large_8gpus_mp on 8x 1080TI. So long!

okuchaiev commented 5 years ago

@GabrielLin , w2l_large_8gpus_mp is mixed precision config. It should not work on 1080Ti, unless you change dtype to float32. Is this what you did? (mixed precision is faster than float32, but it uses hardware features available only on Volta and Turing cards)

GabrielLin commented 5 years ago

@okuchaiev . Really really thanks. You gave me a great reminder. I have not got this detail before. I will retrain my model.

By the way, if I use pre-train mixed models for inference on 1080TI, is it OK?

Shujian2015 commented 5 years ago

Hi @bmwshop, do you mind sharing the training time of Jasper 10*5 on Fisher+swbd?