NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 372 forks source link

Use pre-trained Librispeech on Jasper to another dataset #479

Closed ngochuyenluu closed 4 years ago

ngochuyenluu commented 4 years ago

Hello everyone, I intend to train a dataset about tax and finance around 1.5G on Jasper model. I would like to use pre-trained Librispeech to continue training with n-gram language model and beam search to save time and gain efforts. As the advised in #470, all the dataset was preprocessed to around (10-24s) with the transcripts, the learning rate is decreased, the number of epoch is increased... I tried to do all the possible tuning hyperparameters to improve the model's performances. I also replace training_params, eval_params, and infer_params by my training, dev, and test files.

Finally, model worked well on training, but the validation was very poor. Although I tested the validation on pretrained before, it gaves 0.34, better than after training my dataset (around 0.97).

I think my training changed the pre-trained checkpoints, so I trained on my dataset + Librispeech and validated on my dev file + Librispeech's dev files, it began to train from 3th epoch and took a lot of time each epoch (on 2 GPU Titan V). I still wait for their improvements.

Did I make any mistake to use pre-trained to continue my training? It could be changed checkpoints in pre-trained? If anyone has any suggestions, it will be cool. Thank you in advance.

ngochuyenluu commented 4 years ago

Hi, just info for who is currently working on the same approach with me. To continue with pre-trained model, should add the training file in training_params, not replace all the dataset of Librispeech because the model will continue learning to adapt with new dataset and remove the initial checkpoint, it will create overfitting on the eval test because the training is not suffisant for it.

If adding the step language model and beam search will need to retrain from the begining, it takes very long time. So if NVIDIA can public the pre-trained model of two other variant which added 6-gram language model and beam search, it will be super great!!