MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
2.53k stars 243 forks source link

RuntimeError: Calculated padded input size per channel: (1). Kernel size: (2). Kernel size can't be greater than actual input size #141

Closed pk41561 closed 7 months ago

pk41561 commented 7 months ago

Help me figure out the below error-

Traceback (most recent call last): File "C:\Users\PRABKU1\Desktop\Speaker_diarization\diarize.py", line 115, in <module> result_aligned = whisperx.align( ^^^^^^^^^^^^^^^ File "C:\Users\PRABKU1\AppData\Local\Programs\Python\Python311\Lib\site-packages\whisperx\alignment.py", line 224, in align emissions, _ = model(waveform_segment.to(device)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PRABKU1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PRABKU1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchaudio\models\wav2vec2\model.py", line 116, in forward x, lengths = self.feature_extractor(waveforms, lengths) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PRABKU1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PRABKU1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchaudio\models\wav2vec2\components.py", line 141, in forward x, length = layer(x, length) # (batch, feature, frame) ^^^^^^^^^^^^^^^^ File "C:\Users\PRABKU1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PRABKU1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torchaudio\models\wav2vec2\components.py", line 90, in forward x = self.conv(x) ^^^^^^^^^^^^ File "C:\Users\PRABKU1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PRABKU1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\conv.py", line 313, in forward return self._conv_forward(input, self.weight, self.bias) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PRABKU1\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\conv.py", line 309, in _conv_forward return F.conv1d(input, weight, bias, self.stride, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Calculated padded input size per channel: (1). Kernel size: (2). Kernel size can't be greater than actual input size