ai-bot-pro / achatbot

An open source chat bot architecture for voice/vision (and multimodal) assistants, local and remote to run; if u run achatbot by yourself, u can learn more, star and fork to contribute~
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

torchaudio/compliance/kaldi.py AssertionError: choose a window size 400 that is [2, 342] #62

Closed weedge closed 1 month ago

weedge commented 1 month ago

error log

2024-09-18 09:52:06,859 - chat-bot - ERROR - /usr/local/lib/python3.10/dist-packages/apipeline/processors/frame_processor.py:179 - push_frame - Uncaught exception in DailyInputTransportProcessor#0: choose a window size 400 that is [2, 342]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/apipeline/processors/frame_processor.py", line 171, in push_frame
    await self._next.process_frame(frame, direction)
  File "/content/achatbot/src/processors/speech/asr/base.py", line 54, in process_frame
    await self.process_audio_frame(frame)
  File "/content/achatbot/src/processors/speech/asr/base.py", line 127, in process_audio_frame
    await self.process_generator(self.run_asr(self._content.read()))
  File "/content/achatbot/src/processors/ai_processor.py", line 71, in process_generator
    async for f in generator:
  File "/content/achatbot/src/processors/speech/asr/asr_processor.py", line 55, in run_asr
    async for segment in self._asr.transcribe_stream(self._session):
  File "/content/achatbot/src/modules/speech/asr/sense_voice_asr.py", line 28, in transcribe_stream
    transcription, _ = await asyncio.to_thread(
  File "/usr/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/content/achatbot/deps/SenseVoice/model.py", line 817, in inference
    speech, speech_lengths = extract_fbank(
  File "/usr/local/lib/python3.10/dist-packages/funasr/utils/load_utils.py", line 173, in extract_fbank
    data, data_len = frontend(data, data_len, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/funasr/frontends/wav_frontend.py", line 134, in forward
    mat = kaldi.fbank(
  File "/usr/local/lib/python3.10/dist-packages/torchaudio/compliance/kaldi.py", line 591, in fbank
    waveform, window_shift, window_size, padded_window_size = _get_waveform_and_window_properties(
  File "/usr/local/lib/python3.10/dist-packages/torchaudio/compliance/kaldi.py", line 142, in _get_waveform_and_window_properties
    assert 2 <= window_size <= len(waveform), "choose a window size {} that is [2, {}]".format(
AssertionError: choose a window size 400 that is [2, 342]
weedge commented 1 month ago

issue: https://github.com/modelscope/FunASR/issues/1924 fixbug: https://github.com/modelscope/FunASR/pull/1940

funasr 升级到当前最新版本 pip install -U funasr