flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.39k stars 1.01k forks source link

Train for other language errors #167

Closed AlleyEli closed 5 years ago

AlleyEli commented 5 years ago

No errors in the training LibriSpeech dataset

I prepared chinese dataset e.g. :

~/wav2letter/data/chinese_LJSpeech-1.1/data$ head tokens.txt
|
他
们
走
到
四
马
路
一
家

~/wav2letter/data/chinese_LJSpeech-1.1/data/train$ ls  000000001.*
000000001.id  000000001.tkn  000000001.wav 000000001.wrd

~/wav2letter/data/chinese_LJSpeech-1.1/data/train$ cat 000000001.wrd
企业 依靠 技术 挖潜 增效 他 负责 全厂 产品质量 与 技术培训 成了 厂里 的 大忙人

~/wav2letter/data/chinese_LJSpeech-1.1/data/train$ cat 000000001.tkn
企 业 | 依 靠 | 技 术 | 挖 潜 | 增 效 | 他 | 负 责 | 全 厂 | 产 品 质 量 | 与 | 技 术 培 训 | 成 了 | 厂 里 | 的 | 大 忙 人 

~/wav2letter/data/chinese_LJSpeech-1.1/data/train$ cat 000000001.id
file_id 1

train.cfg

--datadir=/root/wav2letter/data/chinese_LJSpeech-1.1/
--tokensdir=/root/wav2letter/data/chinese_LJSpeech-1.1/
--rundir=/root/wav2letter/wav2letter/model_Chinese
--archdir=/root/wav2letter/wav2letter/tutorials/1-chinese_LJSpeech-1.1
--train=data/train
--valid=data/dev
--input=wav
--arch=network.arch
--tokens=data/tokens.txt
--criterion=ctc
--lr=0.1
--maxgradnorm=1.0
--replabel=2
--surround=|
--onorm=target
--sqnorm=true
--mfsc=true
--filterbanks=40
--nthread=4
--batchsize=4
--runname=chinese_LJSpeech-1.1_logs
--iter=100

The training went wrong, I don't know how to solve. :

~/wav2letter/wav2letter(master)$ ./build/Train train --flagsfile=/root/wav2letter/wav2letter/tutorials/1-chinese_LJSpeech-1.1/train.cfg
*** Aborted at 1548061081 (unix time) try "date -d @1548061081" if you are using GNU date ***
PC: @     0x7fc083a716f9 mkldnn::impl::get_msec()
*** SIGILL (@0x7fc083a716f9) received by PID 168660 (TID 0x7fc085f4cbc0) from PID 18446744071623350009; stack trace: ***
    @     0x7fc07d238390 (unknown)
    @     0x7fc083a716f9 mkldnn::impl::get_msec()
    @     0x7fc083aef94f mkldnn::impl::cpu::gemm_convolution_fwd_t::pd_t::create_primitive()
    @           0x60e0bc fl::conv2d()
    @           0x5f0716 fl::Conv2D::forward()
    @           0x5fea7f fl::UnaryModule::forward()
    @           0x5ef712 fl::Sequential::forward()
    @           0x45e45b _ZZ4mainENKUlSt10shared_ptrIN2fl6ModuleEES_IN3w2l17SequenceCriterionEES_INS3_10W2lDatasetEERNS0_19FirstOrderOptimizerES9_biE4_clES2_S5_S7_S9_S9_bi.constprop.8772
    @           0x417cc2 main
    @     0x7fc07ac1f830 __libc_start_main
    @           0x45a749 _start
    @                0x0 (unknown)
Illegal instruction (core dumped)
vineelpratap commented 5 years ago

Hi @AlleyEli , We currently read all text files assuming they have ASCII character. We would have to update the C++ code to make the training work for UTF-8 encoded characters. We will try to get these fixed by next week.

isaacleeai commented 5 years ago

@vineelpratap is this fixed? if not, how long do you think it will take ( very roughly is fine, like days, weeks, months... )?

Thanks

vineelpratap commented 5 years ago

@isaacleeai - It is supported. You can give a try.

@AlleyEli - Can you try the training again now

AlleyEli commented 5 years ago

@vineelpratap This problem is due to a problem triggered by the Dockerfile-CPU file. I have not found this problem in my own environment. so I don't know if it is the reason for coding.

vineelpratap commented 5 years ago

@AlleyEli - Can you share the complete output log after you run ./build/Train train --flagsfile=/root/wav2letter/wav2letter/tutorials/1-chinese_LJSpeech-1.1/train.cfg

AlleyEli commented 5 years ago

At present, I have changed to another Chinese data set --> thchs30 The data set structure is the same

epoch:        1 | lr: 0.100000 | lrcriterion: 0.000000 | runtime: 03:53:05 | bch(ms): 4178.46 | smp(ms): 0.99 | fwd(ms): 918.71 | crit-fwd(ms): 326.81 | bwd(ms): 3138.02 | optim(ms): 78.84 | loss:   40.10400 | train-TER: 83.21 | dev-TER: 75.35 | avg-isz: 916 | avg-tsz: 058 | max-tsz: 077 | hrs:   34.09 | thrpt(sec/sec): 8.78
epoch:        2 | lr: 0.100000 | lrcriterion: 0.000000 | runtime: 05:37:45 | bch(ms): 6054.70 | smp(ms): 0.84 | fwd(ms): 996.83 | crit-fwd(ms): 329.52 | bwd(ms): 4935.51 | optim(ms): 79.62 | loss:   34.14847 | train-TER: 77.45 | dev-TER: 77.67 | avg-isz: 916 | avg-tsz: 058 | max-tsz: 077 | hrs:   34.09 | thrpt(sec/sec): 6.06
epoch:        3 | lr: 0.100000 | lrcriterion: 0.000000 | runtime: 05:27:15 | bch(ms): 5866.72 | smp(ms): 0.83 | fwd(ms): 991.97 | crit-fwd(ms): 331.21 | bwd(ms): 4753.12 | optim(ms): 78.85 | loss:   33.73276 | train-TER: 77.60 | dev-TER: 75.98 | avg-isz: 916 | avg-tsz: 058 | max-tsz: 077 | hrs:   34.09 | thrpt(sec/sec): 6.25
vineelpratap commented 5 years ago

Hi, So, it looks like you were able to train the model successfully ?

AlleyEli commented 5 years ago

So far so good

GabrielLin commented 5 years ago

@AlleyEli Could you please share your train log and test result of AIShell. Do you use Chinese language model? Thanks.

Tudou880306 commented 5 years ago

@AlleyEli 你训练的结果如何?我训练完发现,错误率很高,wer 74%, ler 47%,求教

kenxiexiaolong commented 4 years ago

@AlleyEli 你训练的结果如何?我训练完发现,错误率很高,wer 74%, ler 47%,求教

可以组建个群,一起讨论嘛?