PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Apache License 2.0
10.86k stars 1.82k forks source link

Error report: Errors about run "BaiduCN1.2k Model" ! #212

Closed gangyahaidao closed 6 years ago

gangyahaidao commented 6 years ago

Hello guys, when i test Aishell, everything runs ok, except the high "Word Error Rate"!

then i begin to test "BaiduCN1.2k Model", something went wrong.

1.i known that BaiduCN1.2k speech rate is 8khz, then I change the hz from file and to 8000hz, when i speak from the mocrophone, i get some errors from, as follow:

Received utterance[length=94208] from, saved to demo_cache/20180419073019_127.0.0.1.wav.
/usr/local/lib/python2.7/dist-packages/resampy/ FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  if not np.issubdtype(x.dtype, np.float):

and when the warmup file is 8000hz, got the same error:

('Warm-up Test Case %d: %s', 0, u'/home/train/.cache/paddle/dataset/speech/Aishell/data_aishell/wav/test/S0913/BAC009S0913W0152.wav')
/usr/local/lib/python2.7/dist-packages/resampy/ FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  if not np.issubdtype(x.dtype, np.float):

I dont known why, please light me

2.question two: I dont known how to run asr server by "BaiduCN1.2k Model", here is my config file```

start demo server

CUDA_VISIBLE_DEVICES=0 \ python -u deploy/ \ --host_ip='localhost' \ --host_port=8086 \ --num_conv_layers=2 \ --num_rnn_layers=3 \ --rnn_layer_size=1024 \ --alpha=1.15 \ --beta=0.15 \ --cutoff_prob=1.0 \ --cutoff_top_n=40 \ --use_gru=True \ --use_gpu=True \ --share_rnn_weights=False \ --speech_save_dir='demo_cache' \ --warmup_manifest='data/aishell/manifest.test' \ --mean_std_path='models/baidu_zh12k/mean_std.npz' \ --vocab_path='models/baidu_zh12k/vocab.txt' \ --model_path='models/baidu_zh12k/params.tar.gz' \ --lang_model_path='models/lm/zh_giga.no_cna_cmn.prune01244.klm' \ --decoding_method='ctc_beam_search' \ --specgram_type='linear'

after I startup the server, it cannot recognise wav file correct, what else I need to do?

any help will be very appreciate, thank you 
frozenfires commented 6 years ago

Question.2: You use gpu to run, but the parameter CUDA_VISIBLE_DEVICES=0. It should be set to the parameters of the language model. You can try 0, 1, 2, 4, 4. Or parameter use_gpu=False.

gangyahaidao commented 6 years ago

thank you for quick help, after i refer to #116 , i get this changeto my file "设置cutoff_prob小于1.0,比如0.99", amazy, error disappear! But, the recognise result is very bad, for example, i said "你好", i got this

Start Recording ...                                                               Speech[length=36864] Sent.
Recognition Results: 芯
Start Recording ...                                                                 Speech[length=38912] Sent.
Recognition Results: 芯
Start Recording ...                                                                                      Speech[length=49152] Sent.
Recognition Results: 匕

is this tell that the "BaiduCN1.2k Model" has some training problem, or it doesnot have trained enough well ??

frozenfires commented 6 years ago

set alpha=2.6,this is my config:

start demo server

CUDA_VISIBLE_DEVICES=0 \ python -u deploy/ \ --host_ip='' \ --host_port=6666 \ --beam_size=300 \ --num_conv_layers=2 \ --num_rnn_layers=3 \ --rnn_layer_size=1024 \ --alpha=2.6 \ --beta=5.0 \ --cutoff_prob=0.99 \ --cutoff_top_n=40 \ --use_gru=True \ --use_gpu=False \ --share_rnn_weights=False \ --speech_save_dir='demo_cache' \ --warmup_manifest='data/aishell/manifest.test' \ --mean_std_path='data/aishell/mean_std.npz' \ --vocab_path='data/aishell/vocab.txt' \ --model_path='data/aishell/params.tar.gz' \ --lang_model_path='data/aishell/zh_giga.no_cna_cmn.prune01244.klm' \ --decoding_method='ctc_beam_search' \ --specgram_type='linear'

frozenfires commented 6 years ago

I was very accurate when I tried, and I doubt your Chinese pronunciation. -_-!!

gangyahaidao commented 6 years ago

so kind of you ,but when i set alpha=2.6 and --beta=5.0 with "BaiduCN1.2k Model", I got this result, my chinese is study with a Beijing Mandarin frind, not bad i guess!, when i say "今天天气怎么样", got this:

Response Time: 3.740669, Transcript: 韦莱兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹
Received utterance[length=110592] from, saved to demo_cache/20180419084354_127.0.0.1.wav.
Response Time: 3.217678, Transcript: 韦莱兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹
Received utterance[length=114688] from, saved to demo_cache/20180419084403_127.0.0.1.wav.
Response Time: 3.348469, Transcript: 韦莱兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹韦兹

so .... Thank you anyway~ By the way, can i ask you what size of your model "baidu_cn1.2k_model.tar.gz", is it 759M ?? i doubt the file is complete downloaded

frozenfires commented 6 years ago

no,my speech model use the 'Mandarin Aishell Model 151h' Do you have QQ or WeChat? We can communicate and share easily.

gangyahaidao commented 6 years ago

I download it again, 759M, so my model file is complete, but why i get such strange output, much LOWER WER than Aishell, I will try to find out the reason

gangyahaidao commented 6 years ago

Yes, QQ 736791342 Wang, when i use --alpha=2.6 --beta=5.0 , its a little slower than --alpha=1.15 --beta=0.15, and donot have much higher accurate

gangyahaidao commented 6 years ago

this error solved, other error will create a new issue, close it

blood0708 commented 5 years ago

@gangyahaidao 我也遇到了相同的问题,请问最后是怎么解决的阿?

blood0708 commented 5 years ago

@gangyahaidao I also met the same problem. How did you solve it in the end?

sunjunlishi commented 5 years ago

@gangyahaidao how do you use it

sunjunlishi commented 5 years ago

@gangyahaidao how do you resolve it,very GOOD!!!!!!!!!!!!!!!!!!11

code-R commented 5 years ago

@gangyahaidao Can you please share the mandarin pretrained model.. the link available doesn't work any more. (BaiduCN1.2k Model)