RuntimeError: Calculated padded input size per channel: (1). Kernel size: (2). Kernel size can't be greater than actual input size

MyraBaba commented 1 year ago

@MahmoudAshraf97 I have below error:

python3 diarize.py -a ../whisperX/output.wav
[NeMo W 2023-06-19 11:53:43 optimizers:54] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2023-06-19 11:53:44 experimental:27] Module <class 'nemo.collections.asr.modules.audio_modules.SpectrogramToMultichannelFeatures'> is experimental, not ready for production and is not fully supported. Use at your own risk.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /data/dProjects/whisper-diarization/temp_outputs/htdemucs
Separating track ../whisperX/output.wav
100%|██████████████████████████████████████████████████████████████████████| 602.55/602.55 [00:12<00:00, 48.97seconds/s]
Failed to align segment (" Google."): backtrack failed, resorting to original...
Failed to align segment: duration smaller than 0.02s time precision
Failed to align segment: duration smaller than 0.02s time precision
Traceback (most recent call last):
  File "diarize.py", line 89, in <module>
    result_aligned = whisperx.align(
  File "/data/dProjects/faster-whisper/venFasterWhsiper/lib/python3.8/site-packages/whisperx/alignment.py", line 224, in align
    emissions, _ = model(waveform_segment.to(device))
  File "/data/dProjects/faster-whisper/venFasterWhsiper/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/dProjects/faster-whisper/venFasterWhsiper/lib/python3.8/site-packages/torchaudio/models/wav2vec2/model.py", line 116, in forward
    x, lengths = self.feature_extractor(waveforms, lengths)
  File "/data/dProjects/faster-whisper/venFasterWhsiper/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/dProjects/faster-whisper/venFasterWhsiper/lib/python3.8/site-packages/torchaudio/models/wav2vec2/components.py", line 141, in forward
    x, length = layer(x, length)  # (batch, feature, frame)
  File "/data/dProjects/faster-whisper/venFasterWhsiper/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/dProjects/faster-whisper/venFasterWhsiper/lib/python3.8/site-packages/torchaudio/models/wav2vec2/components.py", line 90, in forward
    x = self.conv(x)
  File "/data/dProjects/faster-whisper/venFasterWhsiper/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data/dProjects/faster-whisper/venFasterWhsiper/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 313, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/data/dProjects/faster-whisper/venFasterWhsiper/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Calculated padded input size per channel: (1). Kernel size: (2). Kernel size can't be greater than actual input size

sam1am commented 1 year ago

Same issue here on Windows with 3090

catyung commented 1 year ago

Could you please confirm the whisperx version that you are currently using ?

I was having this error, when I used whisperx v 1.0 , based on the requirements.txt, however, when I updated to the latest whisperx version the problem seems fixed.

cateyelow commented 1 year ago

let's upgrade whisperx

pip install git+https://github.com/m-bain/whisperx.git --upgrade
then change beam_size from 1 to 7 on diarize.py segments, info = whisper_model.transcribe( vocal_target, beam_size=7, word_timestamps=True, language=info.language, )

TheGermanEngie commented 1 year ago

is it confirmed that updating whisperx works?

MahmoudAshraf97 / whisper-diarization

RuntimeError: Calculated padded input size per channel: (1). Kernel size: (2). Kernel size can't be greater than actual input size #56