alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.73k stars 1.08k forks source link

Decoding Crashed in Some Cases #176

Closed qo6xup6 closed 4 years ago

qo6xup6 commented 4 years ago

When I tried to decode audio with spk feature (without spk feature also got same error and crashed). API crashed and show this error: ASSERTION_FAILED (VoskAPI:Init():kaldi-matrix.cc:787) Assertion failed: (rows == 0 && cols == 0) Full log has been uploaded here: asr_spk_crash_3.log

Models used are: vosk-model-en-us-aspire-0.2 vosk-model-spk-0.3

Bit I have updated conf/model.conf to --endpoint.rule2.min-trailing-silence=0.25 --endpoint.rule3.min-trailing-silence=0.5 --endpoint.rule4.min-trailing-silence=0.75

qo6xup6 commented 4 years ago

Still no clue the issue causing this crash so far.

2020-08-14 16:47:04 - asr_server_spk.py:135 - INFO - { "partial" : "and"} 2020-08-14 16:47:04 - asr_server_spk.py:135 - INFO - { "partial" : "and now i"} 2020-08-14 16:47:04 - asr_server_spk.py:135 - INFO - { "partial" : "and now i will ask"} 2020-08-14 16:47:04 - asr_server_spk.py:135 - INFO - { "partial" : "and now i will ask you"} 2020-08-14 16:47:04 - asr_server_spk.py:135 - INFO - { "partial" : "and now i will ask you a question"} 2020-08-14 16:47:04 - asr_server_spk.py:135 - INFO - { "partial" : "and now i will ask you a question in chinese"} 2020-08-14 16:47:04 - asr_server_spk.py:135 - INFO - { "partial" : "and now i will ask you a question in chinese"} 2020-08-14 16:47:04 - asr_server_spk.py:135 - INFO - { "partial" : "and now i will ask you a question in chinese"} 2020-08-14 16:47:04 - asr_server_spk.py:135 - INFO - { "partial" : "and now i will ask you a question in chinese please respond"} 2020-08-14 16:47:04 - asr_server_spk.py:135 - INFO - { "partial" : "and now i will ask you a question in chinese please respond in chinese"} 2020-08-14 16:47:04 - asr_server_spk.py:135 - INFO - { "partial" : "and now i will ask you a question in chinese please respond in chinese"} 2020-08-14 16:47:05 - asr_server_spk.py:135 - INFO - { "result" : [{ "conf" : 1.000000, "end" : 514.530000, "start" : 514.170000, "word" : "and" }, { "conf" : 1.000000, "end" : 515.040000, "start" : 514.560000, "word" : "now" }, { "conf" : 1.000000, "end" : 515.190000, "start" : 515.070000, "word" : "i" }, { "conf" : 1.000000, "end" : 515.430000, "start" : 515.190000, "word" : "will" }, { "conf" : 1.000000, "end" : 515.820000, "start" : 515.430000, "word" : "ask" }, { "conf" : 1.000000, "end" : 516.060000, "start" : 515.820000, "word" : "you" }, { "conf" : 1.000000, "end" : 516.240000, "start" : 516.090000, "word" : "a" }, { "conf" : 1.000000, "end" : 516.810000, "start" : 516.240000, "word" : "question" }, { "conf" : 1.000000, "end" : 516.990000, "start" : 516.840000, "word" : "in" }, { "conf" : 1.000000, "end" : 517.650000, "start" : 516.990000, "word" : "chinese" }, { "conf" : 1.000000, "end" : 518.370000, "start" : 517.980000, "word" : "please" }, { "conf" : 1.000000, "end" : 518.880000, "start" : 518.370000, "word" : "respond" }, { "conf" : 1.000000, "end" : 519.030000, "start" : 518.910000, "word" : "in" }, { "conf" : 1.000000, "end" : 519.486211, "start" : 519.030000, "word" : "chinese" }], "spk" : [-9.822136, -9.428958, -11.654249, -8.152773, -8.194983, -13.960484, -7.391829, -9.642398, -8.538031, -13.262074, -13.246984, -1.422418, -14.761465, -5.442826, -13.678344, -12.126788, -10.435603, -9.167766, -9.177683, -5.168295, -10.574643, -1.164607, -14.012313, -11.660321, -12.400189, -13.308061, -4.195704, -14.075627, -6.335491, -9.836514, -11.276605, -6.782085, -10.853158, -13.963091, -8.771411, -1.898918, -13.493228, -5.886063, -7.709649, -6.099607, -11.923589, -7.874171, -16.074160, -7.321682, -6.146068, -5.728894, -8.037802, -12.261643, -5.579471, -1.383335, -8.356176], "text" : "and now i will ask you a question in chinese please respond in chinese"} 2020-08-14 16:47:05 - asr_server_spk.py:135 - INFO - { "partial" : ""} LOG (VoskAPI:~CachingOptimizingCompiler():nnet-optimize.cc:710) 0.217 seconds taken in nnet3 compilation total (breakdown: 0.215 compilation, 0.00059 optimization, 0 shortcut expansion, 0.000129 checking, 0 computing indexes, 0.00168 misc.) + 0 I/O. ASSERTION_FAILED (VoskAPI:Init():kaldi-matrix.cc:787) Assertion failed: (rows == 0 && cols == 0)

nshmyrev commented 4 years ago

Is it after last update? I think we fixed it recently.

qo6xup6 commented 4 years ago

After spk vector was fixed. Seems another issue. I have tried return false when both dim are 0, but seems not helpful.

In kaldi_recognizer.cc Line: 313 int num_frames = spk_feature_->NumFramesReady(); if (num_frames == 0 && spk_feature_->Dim() == 0) return false; Matrix<BaseFloat> mfcc(num_frames, spk_feature_->Dim());

Line: 337 SlidingWindowCmnOptions cmvn_opts; if (mfcc.NumRows() == 0 && mfcc.NumCols() == 0) return false; Matrix<BaseFloat> features(mfcc.NumRows(), mfcc.NumCols(), kUndefined); SlidingWindowCmn(cmvn_opts, mfcc, &features);

nshmyrev commented 4 years ago

You need to check with or not and:

if (num_frames == 0 || spkfeature->Dim() == 0)

qo6xup6 commented 4 years ago

Thanks for the hint, now it's working. Won't crash by assert. But I still have 2 audios ( 11min long ) which can fail during decoding. Does audio length longer than 10 min not supported?

nshmyrev commented 4 years ago

Does audio length longer than 10 min not supported?

We support arbitrary length, 10 minutes and more.

As for the crash, you need to provide more information to get help.

qo6xup6 commented 4 years ago

I got no error log with these 2 audios, but decoding ends quickly at very beginning. And only got a few partial results like below: 2020-08-14 17:58:26 - asr_server_spk.py:135 - INFO - { "partial" : ""} 2020-08-14 17:58:26 - asr_server_spk.py:135 - INFO - { "partial" : ""} 2020-08-14 17:59:12 - asr_server_spk.py:135 - INFO - { "partial" : ""}

nshmyrev commented 4 years ago

You'd better try without the server first with vosk-api samples. It might be websocket protocol issue for your client (if your client doesn't support ping frames).

qo6xup6 commented 4 years ago

Sure, I will further check it, thanks for the matrix 0 dim issue. Would you please push this fix later? Thanks again.