Camb-ai / MARS5-TTS

MARS5 speech model (TTS) from CAMB.AI
https://www.camb.ai
GNU Affero General Public License v3.0
2.47k stars 200 forks source link

[BUG] RuntimeError: Calculated padded input size per channel: (6). Kernel size: (7). Kernel size can't be greater than actual input size #76

Open hgftrdw45ud67is8o89 opened 2 months ago

hgftrdw45ud67is8o89 commented 2 months ago
File "....\MARS5-TTS\./mdl\hub\Camb-ai_mars5-tts_master\inference.py", line 291, in tts
    final_audio = self.vocode(final_output).squeeze()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".....\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "....\MARS5-TTS\./mdl\hub\Camb-ai_mars5-tts_master\inference.py", line 158, in vocode
    wav_diffusion = self.vocos.decode(features, bandwidth_id=bandwidth_id)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......
  File "....\Lib\site-packages\torch\nn\modules\conv.py", line 306, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Calculated padded input size per channel: (6). Kernel size: (7). Kernel size can't be greater than actual input size

I not sure what is wrong i feeded a 5 second wav file and a transcript.but throws this error.

RF5 commented 2 months ago

This seems to happen when Mars 5 fails to generate any output audio and predicts an token as the first output. Can you double check that your prompt/reference transcript is accurate for deep clone?

hgftrdw45ud67is8o89 commented 2 months ago

maybe because it doesnt support jp or cn? I tried with a new en audio source, i don't think the output is ...human speech. does marstts not support '-' or '~'?