model training for Interactive Streaming

flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit

https://github.com/facebookresearch/wav2letter/wiki

Other

6.37k stars 1.01k forks source link

model training for Interactive Streaming #594

Closed intouch1233 closed 3 years ago

intouch1233 commented 4 years ago

In this example I used the way to create a lexicon with word piece, right? I tried to apply it to my data in Thai, but the results of the model training were very bad with 300 hours of data. Is grapheme better than word piece for my data? What I want to ask is, can I use the grapheme in this example instead of word piece. Or is there any advice for me?

My result is below

Or should I wait longer?

Thank you.

lunixbochs commented 4 years ago

300h is not very much. Which model architecture are you training?

The first problem here is that you're only on epoch 2. Most of the models require at least 10 epochs to converge. You can set your reportiters to 0 if you only want to see output every epoch.

You can use the wpiece command from https://github.com/talonvoice/wav2train to generate a wordpiece model and lexicon/tokens from your training data.

I recommend training the streaming convnet model, if you need help picking a model.

intouch1233 commented 4 years ago

@lunixbochs

The architecture that I use is TDS+CTC loss

Now I have followed the tutorial. Of you on the Librispeech data set by preparing my own data in Thai and monitoring the results until 25 epoch now Which has good results, I came to the right way, right? Wait a moment, will come to help again. Thank you.