FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model
https://funaudiollm.github.io/
Other
3.42k stars 311 forks source link

AssertionError: choose a window size 400 that is [2, 0] #30

Closed Hy-1990 closed 3 months ago

Hy-1990 commented 4 months ago

File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/gradio/queueing.py", line 532, in process_events response = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/gradio/blocks.py", line 1514, in call_function prediction = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/gradio/utils.py", line 832, in wrapper response = f(args, kwargs) ^^^^^^^^^^^^^^^^^^ File "/home/huyi/PycharmProjects/SenseVoice/webui.py", line 167, in model_inference merge_vad = True #False if selected_task == "ASR" else True ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 263, in generate return self.inference_with_vad(input, input_len=input_len, cfg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 410, in inference_with_vad results = self.inference( ^^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 300, in inference res = model.inference(batch, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huyi/PycharmProjects/SenseVoice/model.py", line 817, in inference speech, speech_lengths = extract_fbank( ^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/funasr/utils/load_utils.py", line 173, in extract_fbank data, data_len = frontend(data, data_len, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/funasr/frontends/wav_frontend.py", line 134, in forward mat = kaldi.fbank( ^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/torchaudio/compliance/kaldi.py", line 591, in fbank waveform, window_shift, window_size, padded_window_size = _get_waveform_and_window_properties( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/huyi/anaconda3/envs/tts/lib/python3.11/site-packages/torchaudio/compliance/kaldi.py", line 142, in _get_waveform_and_window_properties assert 2 <= window_size <= len(waveform), "choose a window size {} that is [2, {}]".format( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AssertionError: choose a window size 400 that is [2, 0]

我看cache={}参数传了,相同音频第二次执行报错。

LauraGPT commented 4 months ago

Please update funasr-1.1.2, and try it again. We have updated the docs in README

Joy-word commented 3 months ago

同样遇到这个问题,同一个音频首次可推理,第二次及以后均报错。测试与 VAD 分割有关,升级到 funasr-1.1.2 未能解决。

Joy-word commented 3 months ago

补充:generate 方法参数中去除 merge_vad 的设置后,未出现这个问题。

LauraGPT commented 3 months ago

补充:generate 方法参数中去除 merge_vad 的设置后,未出现这个问题。

Thanks, I would check it.

LauraGPT commented 3 months ago

Yes, it is a bug. We would fix it soon.

LauraGPT commented 3 months ago

We have fixed the bug: https://github.com/modelscope/FunASR/pull/1940 Please update funasr:

git clone https://github.com/alibaba/FunASR.git && cd FunASR
pip3 install -e ./
Joy-word commented 3 months ago

对于一些音频,似乎还是存在经过 merge_vad 后,报断言错误的可能。 新发现有一个音频使用 merge_vad 报错: choose a window size 400 that is [2, 160] 而不使用该参数是正常的。

查看了一下 merge_vad 方法,上次修改新增了最小值为 0 的限制,是否应该改为 400 会更合适? @LauraGPT

Joy-word commented 3 months ago

补充: 我打印了 merge_vad 前后的片段列表,似乎新增了两个片段之间的小片段,有些短片段我加粗了。这或许是问题所在。 before [[1710, 19500], [19860, 21340], [22710, 25880], [26160, 33670], [33950, 56610], [56890, 105310], [105590, 124290], [124700, 184710], [184710, 192250], [192530, 199050], [199420, 200990], [201470, 204510], [204790, 208200], [208480, 224080], [225950, 227330], [227690, 234480], [236710, 296720], [296720, 334700], [334980, 336680], [337780, 340440], [340720, 342440], [345380, 358990], [363300, 368340], [369390, 396670], [396980, 398180], [399360, 446140], [446630, 458950], [459470, 516550], [517030, 519950], [520300, 523840], [524790, 531640], [532100, 533850], [534160, 542390], [542720, 545910], [546190, 560490], [560800, 564740], [566320, 580430], [580780, 640790], [640790, 665180], [665460, 725470], [725470, 729130], [729490, 734040], [736440, 796450], [796460, 834580], [835060, 848990], [849720, 870460]] after [[0, 1710], [1710, 19500], [19500, 33950], [33950, 56610], [56610, 56890], [56890, 105310], [105310, 105590], [105590, 124290], [124290, 124700], [124700, 184710], [184710, 199420], [199420, 208480], [208480, 224080], [224080, 236710], [236710, 296720], [296720, 334700], [334700, 345380], [345380, 358990], [358990, 369390], [369390, 396670], [396670, 399360], [399360, 446140], [446140, 459470], [459470, 516550], [516550, 524790], [524790, 534160], [534160, 546190], [546190, 560800], [560800, 566320], [566320, 580780], [580780, 640790], [640790, 665180], [665180, 665460], [665460, 725470], [725470, 736440], [736440, 796450], [796450, 796460], [796460, 834580], [834580, 848990], [848990, 849720], [849720, 870460]]

secslim commented 3 months ago

补充:generate 方法参数中去除 merge_vad 的设置后,未出现这个问题。

请问merge_vad这个参数怎么去除

Joy-word commented 2 months ago

请问merge_vad这个参数怎么去除

不设置就好了

hjj-lmx commented 2 weeks ago

请问merge_vad这个参数怎么去掉

不设置就好了

最后这个问题怎么解决的哦

Joy-word commented 1 week ago

请问merge_vad这个参数怎么去掉

不设置就好了

最后这个问题怎么解决的哦

我根据我的需要修改了源码,修改了 merge_vad 方法。最简单的做法就是将 min_length 默认改为 400 。 如果不希望修改源码,可以直接在 model.generate 方法中去掉 merge_vad 的设置。merge_vad 据作者解释,目的是把丢失的部分加回来。我测试感觉 vad 检测的比较全面,误丢失的部分比较少。