Closed AlleyEli closed 5 years ago
Hi @AlleyEli , We currently read all text files assuming they have ASCII character. We would have to update the C++ code to make the training work for UTF-8 encoded characters. We will try to get these fixed by next week.
@vineelpratap is this fixed? if not, how long do you think it will take ( very roughly is fine, like days, weeks, months... )?
Thanks
@isaacleeai - It is supported. You can give a try.
@AlleyEli - Can you try the training again now
@vineelpratap This problem is due to a problem triggered by the Dockerfile-CPU file. I have not found this problem in my own environment. so I don't know if it is the reason for coding.
@AlleyEli - Can you share the complete output log after you run
./build/Train train --flagsfile=/root/wav2letter/wav2letter/tutorials/1-chinese_LJSpeech-1.1/train.cfg
At present, I have changed to another Chinese data set --> thchs30 The data set structure is the same
epoch: 1 | lr: 0.100000 | lrcriterion: 0.000000 | runtime: 03:53:05 | bch(ms): 4178.46 | smp(ms): 0.99 | fwd(ms): 918.71 | crit-fwd(ms): 326.81 | bwd(ms): 3138.02 | optim(ms): 78.84 | loss: 40.10400 | train-TER: 83.21 | dev-TER: 75.35 | avg-isz: 916 | avg-tsz: 058 | max-tsz: 077 | hrs: 34.09 | thrpt(sec/sec): 8.78
epoch: 2 | lr: 0.100000 | lrcriterion: 0.000000 | runtime: 05:37:45 | bch(ms): 6054.70 | smp(ms): 0.84 | fwd(ms): 996.83 | crit-fwd(ms): 329.52 | bwd(ms): 4935.51 | optim(ms): 79.62 | loss: 34.14847 | train-TER: 77.45 | dev-TER: 77.67 | avg-isz: 916 | avg-tsz: 058 | max-tsz: 077 | hrs: 34.09 | thrpt(sec/sec): 6.06
epoch: 3 | lr: 0.100000 | lrcriterion: 0.000000 | runtime: 05:27:15 | bch(ms): 5866.72 | smp(ms): 0.83 | fwd(ms): 991.97 | crit-fwd(ms): 331.21 | bwd(ms): 4753.12 | optim(ms): 78.85 | loss: 33.73276 | train-TER: 77.60 | dev-TER: 75.98 | avg-isz: 916 | avg-tsz: 058 | max-tsz: 077 | hrs: 34.09 | thrpt(sec/sec): 6.25
Hi, So, it looks like you were able to train the model successfully ?
So far so good
@AlleyEli Could you please share your train log and test result of AIShell. Do you use Chinese language model? Thanks.
@AlleyEli 你训练的结果如何?我训练完发现,错误率很高,wer 74%, ler 47%,求教
@AlleyEli 你训练的结果如何?我训练完发现,错误率很高,wer 74%, ler 47%,求教
可以组建个群,一起讨论嘛?
No errors in the training LibriSpeech dataset
I prepared chinese dataset e.g. :
train.cfg
The training went wrong, I don't know how to solve. :