Closed tjongsma closed 2 months ago
Thank you for your attention! It seems that the input audio has 2 channels, and converting the audio to mono channel in advance should solve this problem. We will also update the code to deal with this situation.
Ah thanks for the prompt reply! That should be easy enough to fix, will get on it :)
Hi there,
Using the following code (following the example)
Gives me the error
Traceback (most recent call last): File "c:\Users\tjong\Desktop\Audio_transcription_dev\whisper_streaming\simul_whisper\simul_whisper-main\transcribe.py", line 39, in
new_toks = model.infer(seg, is_last)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\tjong\Desktop\Audio_transcription_dev\whisper_streaming\simul_whisper\simul_whisper-main\simul_whisper\transcriber\simul_whisper.py", line 199, in infer
encoder_feature = self.model.encoder(mel) ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tjong\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tjong\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\tjong\Desktop\Audio_transcription_dev\whisper_streaming\simul_whisper\simul_whisper-main\simul_whisper\whisper\model.py", line 166, in forward x = F.gelu(self.conv1(x)) ^^^^^^^^^^^^^ File "C:\Users\tjong\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tjong\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tjong\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\conv.py", line 310, in forward return self._conv_forward(input, self.weight, self.bias) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\tjong\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\conv.py", line 306, in _conv_forward return F.conv1d(input, weight, bias, self.stride, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 2, 80, 3000]
Any idea what could be the issue? Thanks in advance!