Closed sw005320 closed 4 years ago
Hi,
If you want I can take chime5, how2, vivos or/and yesno recipes!
Hi,
If you want I can take chime5, how2, vivos or/and yesno recipes!
that is very helpful! thanks!
Added @ruizhilijhu's assignment.
@ftshijt, @Emrys365, @sas91, @YosukeHiguchi, @simpleoier, @ruizhilijhu, What is your progress? Maybe you can refer #1497.
Note that I focused only on creating better design system and I didn't have much times to check whether it could reproduce the previous performance.
Please not trust all codes and it's also very helpful to check the training procedures. @sw005320 Please add voxforge and wsj. I just put the recipe. I mainly worked on RNN architecture, so I haven't see the result of transformer yet.
Note that I focused only on creating better design system and I didn't have much times to check whether it could reproduce the previous performance.
Could you let me know where possibly the result would be changed? I know that refactoring changes the order of the initialization and we could not reproduce the same result, so I already did not care about reproducibility. But if there would be potential algorithmic changes, we need some care. Could you list such items if any?
I think you don''t need to care the new Trainer itself. Even if there are some bugs, I could follow it.
1. There are possibility under the espnet2/asr/encoder and decoder. I tried to merge E2E classes of RNN and Transformer to one class in espnet2 at the first. They has different interface each other, so I needed cut codes from them and unify them into same interface. I'm not original comitter of both of them, so I might take some mistakes. Actually there are a few codes in espnet2 about model structure and almost all are referred from espnet2, so you have already read all of espnet2 parts, then it's O.K.
2.
I implemented BatchSampler based batching system for pytorch's DataLoader. I intended to make it perform same behaviour as espnet1's batchify, but there could be difference because I implemented from scratch.
Note that I haven't implemented bin
or frame
mode yet. Actually I don't understand what it do.
3. ESPnet2 uses on-the-fly text processing. This is one of the special feature of espnet2, and also it is potentially very dangerous because we can't see the actual status of the tokens without debugger. I read your tokenizaton scrips and I unified them in
I'm not sure whether I didn't take some mistakes and. I also dropped some features from the original implementation: nchar-tokenization, phn
mode. (As phn, I couldn't understand what it do.)
The original run.sh may, and is permitted to, take some task depended text processing and I couldn't follow all .
On-the-fly text processing forces us to make different flow in many places. We need to care it especially.
There are no esp-decay scheduling for adadelta. We can use lr-scheduler instead of it.
5.
The feature extractor is different: kaldi-fbank-pitch -> Pytorch-stft + libros-fbank (I believe this is no problem) The audio data is also normalized to [-1,1] in espnet2 though Kaldi never do it.
CMVN stats is calculated using collect stats mode
instead of Kaldi command. I checked that it has same values, so I think it's O.K. (I have read the original codes of Kaldi in the past)
Thanks! Then, all of them seem to be minor and we should not have significant performance changes in theory.
If we are interested in sharing/submitting recipes for new corpora, should we focus on on v2 from the get-go?
Can anyone tell me how I can use the external text resources to train the language model, e.g. in Librispeech asr recipe? It seems that only text with utt-id can be used in the espnet2. If I use the text without utt-id, then the first word will be cut off in here.
@simpleoier
Can anyone tell me how I can use the external text resources to train the language model, e.g. in Librispeech asr recipe? It seems that only text with utt-id can be used in the espnet2. If I use the text without utt-id, then the first word will be cut off in here.
I'm waiting anyone ask me about it.
You can use it with just giving unique ids to each lines yourself. It should be done in local/data.sh . e.g. <text | awk '{ print NR, $1 }'
. This is wsj example.
If we are interested in sharing/submitting recipes for new corpora, should we focus on on v2 from the get-go?
My suggestion is to go with v1 for now. The transition from v1 to v2 would not be very difficult.
Can I ask about why the lr_scheduler, noamlr, only supports pytorch version >= 1.1.0 now? In espnet1, it also supports 1.0.1, doesn't it?
Noam lr scheduler is implemented using batch-step Scheduler of PyTorch, which was implemented in 1.1.
Does anyone test transformer with espnet2? I found the result is very worse in voxforge recipe and got different result from espnet1, so something is wrong. I couldn't find the reason, I'd like anyone to investigate it.
Does anyone test transformer with espnet2? I found the result is very worse in voxforge recipe and got different result from espnet1, so something is wrong. I couldn't find the reason, I'd like anyone to investigate it.
@YosukeHiguchi, can you try it with your JSUT setup?
Does anyone test transformer with espnet2? I found the result is very worse in voxforge recipe and got different result from espnet1, so something is wrong.
I did test transformer for vivos and results were bad. I just tried for how2 and it's more extreme:
2020-01-25 12:21:51,764 (reporter:183) INFO: 1epoch:train:1-357batch: loss=298.730, loss_att=339279.252, loss_ctc=576.931, acc=0.006, cer=1.089, wer=1.000, cer_ctc=4.385, lr_0=5.000
/home/fboye/espnet/espnet2/train/reporter.py:92: UserWarning: No valid stats found
warnings.warn("No valid stats found")
2020-01-25 12:23:25,022 (reporter:183) INFO: 1epoch:train:358-714batch: loss=nan, loss_att=43192.529, loss_ctc=nan, acc=0.009, cer=0.911, wer=1.000, cer_ctc=3.269, lr_0=5.000
2020-01-25 12:24:58,882 (reporter:183) INFO: 1epoch:train:715-1071batch: loss=nan, loss_att=87852.724, loss_ctc=nan, acc=0.009, cer=0.914, wer=1.000, cer_ctc=4.070, lr_0=5.000
2020-01-25 12:26:31,414 (reporter:183) INFO: 1epoch:train:1072-1428batch: loss=nan, loss_att=39662.855, loss_ctc=nan, acc=0.010, cer=0.917, wer=1.000, cer_ctc=3.747, lr_0=5.000
2020-01-25 12:28:06,389 (reporter:183) INFO: 1epoch:train:1429-1785batch: loss=nan, loss_att=33076.978, loss_ctc=nan, acc=0.010, cer=0.912, wer=1.000, cer_ctc=4.056, lr_0=5.000
2020-01-25 12:29:36,650 (reporter:183) INFO: 1epoch:train:1786-2142batch: loss=nan, loss_att=30769.789, loss_ctc=nan, acc=0.010, cer=0.900, wer=1.000, cer_ctc=3.685, lr_0=5.000
2020-01-25 12:31:10,302 (reporter:183) INFO: 1epoch:train:2143-2499batch: loss=nan, loss_att=20457.835, loss_ctc=nan, acc=0.010, cer=0.918, wer=1.000, cer_ctc=4.100, lr_0=5.000
...
2020-01-25 12:52:11,296 (x2num:14) WARNING: NaN or Inf found in input tensor.
2020-01-25 12:52:11,297 (x2num:14) WARNING: NaN or Inf found in input tensor.
2020-01-25 12:52:11,297 (x2num:14) WARNING: NaN or Inf found in input tensor.
2020-01-25 12:52:11,297 (x2num:14) WARNING: NaN or Inf found in input tensor.
...
I couldn't find the reason, I'd like anyone to investigate it.
Not sure if I can pinpoint the problem but I'll investigate.
I found a bug just now, I'll fix it tomorrow. Thanks.
I just got the RNN-based model on Librispeech with Adadelta. There are several points I want to mention.
Here is the training curve. Do you guys have any comments?
Which optimizer did you use? Epsilon-decay one (based on espent1) or others?
torch.optim.Adadelta with hyper-parameters (lr=1.0, rho=0.95, eps=1e-8, weight_decay=0.0)
As far speed, report_cer=True report_wer=True now, but it takes quite some times (I forgot to enable for debugging).
Could you test ReduceLROnPlateau? This is one of the difference from v1 and I'd like to compare the result.
Hi, I just finished and tested the recipes for aishell, timit and hkust. Here are the results.
It seems the results are much worse than those in the original recipes. Do you have some suggestions on the config?
Environments
python version:
3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0]
espnet version:
espnet 0.6.0
pytorch version:
pytorch 1.1.0
Git hash:
e0fd073a70bcded6a0e6a3587630410a994ccdb8
- Commit date:
Sat Jan 11 06:09:24 2020 +0900
Results on AiShell
asr_train_asr_rnn_fbank_pitch_char
- Encoder: 3-layer VGG-BLSTMP with 1024 units
- Decoder: 2-layer LSTM with 1024 units
CER
dataset Snt Wrd Corr Sub Del Ins Err S.Err decode_devdecode_asr_rnn_lm_valid.loss.best_asr_model_valid.acc.best 14326 205341 90.3 9.5 0.2 0.1 9.8 57.5 decode_testdecode_asr_rnn_lm_valid.loss.best_asr_model_valid.acc.best 7176 104765 89.2 10.5 0.4 0.2 11.0 60.0 WER
dataset Snt Wrd Corr Sub Del Ins Err S.Err decode_devdecode_asr_rnn_lm_valid.loss.best_asr_model_valid.acc.best 14326 14326 42.5 57.5 0.0 0.0 57.5 57.5 decode_testdecode_asr_rnn_lm_valid.loss.best_asr_model_valid.acc.best 7176 7176 40.0 60.0 0.0 0.0 60.0 60.0 asr_train_asr_transformer_fbank_pitch_char
- Encoder: 12 layers, 2048 units
- Decoder: 6 layers, 2048 units
CER
dataset Snt Wrd Corr Sub Del Ins Err S.Err decode_devdecode_asr_transformer_lm_valid.loss.best_asr_model_valid.acc.best 14326 205341 41.9 45.7 12.3 4.7 62.8 98.8 decode_testdecode_asr_transformer_lm_valid.loss.best_asr_model_valid.acc.best 7176 104765 37.0 50.6 12.4 7.6 70.6 99.2 WER
dataset Snt Wrd Corr Sub Del Ins Err S.Err decode_devdecode_asr_transformer_lm_valid.loss.best_asr_model_valid.acc.best 14326 14326 1.2 98.8 0.0 0.0 98.8 98.8 decode_testdecode_asr_transformer_lm_valid.loss.best_asr_model_valid.acc.best 7176 7176 0.8 99.2 0.0 0.0 99.2 99.2
Results on TIMIT
asr_train_fbank_pitch_char
- Encoder: 4-layer BLSTM with 320 units
- Decoder: 1-layer LSTM with 300 units
WER
dataset Snt Wrd Corr Sub Del Ins Err S.Err decode_devdecode_asr_model_valid.acc.best 400 3239 27.6 68.0 4.4 12.6 85.0 100.0 decode_testdecode_asr_model_valid.acc.best 192 1570 24.5 69.7 5.8 10.9 86.4 100.0 CER
dataset Snt Wrd Corr Sub Del Ins Err S.Err decode_devdecode_asr_model_valid.acc.best 400 18573 73.9 16.6 9.5 8.2 34.3 100.0 decode_testdecode_asr_model_valid.acc.best 192 9008 71.1 17.8 11.2 7.9 36.9 100.0
Results on HKUST
@Emrys365 Hi,
Just looking at aishell and original config, I would suggest :
encoder_conf:
...
use_projection: False
...
decoder_conf:
...
att_conf:
adim: 1024
...
val_scheduler_criterion:
- valid
- acc
best_model_criterion:
- - valid
- acc
- max
Edit: Projection layers are used in original config but from what I tested results were better without so I'm not sure (and btw dropout can't be applied if use_projection=True
because RNNP is a stack of 1-layer torch.nn.LSTM
and dropout is defined within it)
@Emrys365 Hi,
Just looking at aishell and original config, I would suggest :
...
Edit: Projection layers are used in original config but from what I tested results were better without so I'm not sure (and btw dropout can't be applied if
use_projection=True
because RNNP is a stack of 1-layertorch.nn.LSTM
and dropout is defined within it)
Thank you for your suggestion! I will try this config.
@Emrys365 Thanks! Did you check #1533 ? Maybe GRU is used due to bug, but it doesn't affect so much in my experiments. How about trying another init method? I'm investigating the performance difference between 1.0.1 and 1.4.0 now. xavier_uniform and xavier_normal seems better than chainer init with v1.4.0 in voxforge. I also found that init=None, i.e. pytorch default, is quite worse in v1.4.0.
Possibly we'll make the second round to refine configuration of all recipes..., so please send PR without much care about the result ( Of course, I'm grad you'll check).
Possibly we'll make the second round to refine configuration of all recipes..., so please send PR without much care about the result ( Of course, I'm grad you'll check).
I agree with this direction. Anyway, it is great that immigration to espnet2 seems to be fine for most recipes.
@Emrys365 Thanks! Did you check #1533 ? Maybe GRU is used due to bug, but it doesn't affect so much in my experiments. How about trying another init method? I'm investigating the performance difference between 1.0.1 and 1.4.0 now. xavier_uniform and xavier_normal seems better than chainer init with v1.4.0 in voxforge. I also found that init=None, i.e. pytorch default, is quite worse in v1.4.0.
Possibly we'll make the second round to refine configuration of all recipes..., so please send PR without much care about the result ( Of course, I'm grad you'll check).
Thanks! I'll check the issue and prepare the initial PR.
Hi all, Could you please check the new config in #1601 and redo experiments? Please carefully check the difference between espnet1 and espnet2.
Please also use pytorch=1.4 and builtin-ctc i.e. don't use warp-ctc. In my voxforge experiments, there are no big difference about it.
Hi all. I'll be off from github for 2-3 weeks, sorry.
@Emrys365 hi! I just turn the "trans_type" from char to phn to get the PER. I want to know which config do you use in your results to get WER in your results. Is any config else except "trans_type" need to change?
Hi, I just finished and tested the recipes for aishell, timit and hkust. Here are the results.
It seems the results are much worse than those in the original recipes. Do you have some suggestions on the config? ....
@Emrys365 hi! I just turn the "trans_type" from char to phn to get the PER. I want to know which config do you use in your results to get WER in your results. Is any config else except "trans_type" need to change?
@ReinholdM Thanks! You can check the training config here: AiShell train config (rnn), AiShell train config (transformer).
BTW, I already put the hyperlinks ("train config" in blue color) in the post you cite. You could try with that config first. I am not very sure whether we need to change the parameters for the 'phn' type.
@Emrys365 I wanna ask your config to get WER on TIMIT. But I just got 28%~30% performance when I only change the "trans_type" from char to phn. So I want to know your changement when you get the WER results on TIMIT. Thanks!
@Emrys365 hi! I just turn the "trans_type" from char to phn to get the PER. I want to know which config do you use in your results to get WER in your results. Is any config else except "trans_type" need to change?
@ReinholdM Thanks! You can check the training config here: AiShell train config (rnn), AiShell train config (transformer).
BTW, I already put the hyperlinks ("train config" in blue color) in the post you cite. You could try with that config first. I am not very sure whether we need to change the parameters for the 'phn' type.
@Emrys365 Hi, I would like to know how to reproduce the performance reported in the Readme of espnet2/aishell1/asr, I found the performance is far from the reported in the readme. And I use the lateset version of the ESPNet(both tried with PyTorch 1.1 & 1.4). Thanks a lot.
@tonysy Sorry, I haven't been following the latest ESPnet for some time. I will test my configuration with the new one these days.
@Emrys365 Hi, can you share the commit id and pytorch version, python version to reproduce the performance reported in the README.md, thanks.
@tonysy Sure, You can check this commit: https://github.com/espnet/espnet/pull/1549 (pytorch 1.1.0 and python 3.7.3)
@tonysy More information must be provided when asking the other. Please show acc/loss graph and the WER/CER results at least.
Hi @tonysy, here are the results I've got with the latest ESPnet and PyTorch v1.1.0:
Sat May 2 01:38:30 CST 2020
3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
espnet 0.7.0
pytorch 1.1.0
90ea8c788075d4b8b6bd71250268da61309cf2a2
Mon Apr 27 10:41:23 2020 -0400
dataset | Snt | Wrd | Corr | Sub | Del | Ins | Err | S.Err |
---|---|---|---|---|---|---|---|---|
decode_dev_decode_asr_rnn_lm_train_lm_char_valid.loss.best_asr_model_valid.acc.best | 14326 | 205341 | 92.5 | 7.3 | 0.2 | 0.1 | 7.6 | 49.8 |
decode_test_decode_asr_rnn_lm_train_lm_char_valid.loss.best_asr_model_valid.acc.best | 7176 | 104765 | 91.4 | 8.4 | 0.3 | 0.2 | 8.8 | 53.6 |
dataset | Snt | Wrd | Corr | Sub | Del | Ins | Err | S.Err |
---|---|---|---|---|---|---|---|---|
decode_dev_decode_asr_transformer_lm_train_lm_char_valid.loss.best_asr_model_valid.acc.best | 14326 | 205341 | 81.3 | 16.6 | 2.1 | 0.5 | 19.2 | 72.5 |
decode_test_decode_asr_transformer_lm_train_lm_char_valid.loss.best_asr_model_valid.acc.best | 7176 | 104765 | 79.1 | 18.3 | 2.7 | 0.9 | 21.8 | 74.5 |
The RNN results are very similar to those in espnet/espnet:master/egs2/aishell/asr1/README.md, and the Transformer results are much better now.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue is closed. Please re-open if needed.
如果我们有兴趣共享/提交新语料库的食谱,我们是否应该一开始就专注于v2?
我的建议是暂时使用v1。 从v1到v2的过渡并不困难。
Hi all, Could you please check the new config in #1601 and redo experiments? Please carefully check the difference between espnet1 and espnet2.
Is file in egs corresponding to espnet1 and egs2 to espnet2? Do I better to use espnet1 and use the shell in file egs?
如果我们有兴趣共享/提交新语料库的食谱,我们是否应该一开始就专注于v2?
我的建议是暂时使用v1。 从v1到v2的过渡并不困难。
Is there some documents introduced these folders? Thank you!
Hi,
Is file in egs corresponding to espnet1 and egs2 to espnet2?
Yes, egs
contains ESPnet1 recipe while egs2
contains ESPnet2 recipe.
Do I better to use espnet1 and use the shell in file egs?
Sorry, I'm not sure I understand...
Is there some documents introduced these folders?
I don't think there is a doc introducing these folders. You may find some informations in https://espnet.github.io/espnet/index.html . Each folder name should be self-explanatory though.
We're thinking of converting espnet ASR recipes to new espnet (espnet2) ASR recipes (https://github.com/espnet/espnet/tree/v.0.7.0/egs2). The following is a current assignment. I did not finish the assignment of some recipes, and if you volunteer to do it, please let me know!
@ftshijt, @Emrys365, @sas91, @YosukeHiguchi, @simpleoier, Thanks a lot for helping it! This is a temporal assignment. Please let me know if you have any requests for the assignment. Also, if you have any problems, comments on our new design, etc., you may use this issue.