coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.63k stars 4.36k forks source link

[Bug] Kernel size can't be greater than actual input size with tts_models/fr/mai/tacotron2-DDC #2336

Closed qwertyuu closed 1 year ago

qwertyuu commented 1 year ago

Describe the bug

Model tts_models/fr/mai/tacotron2-DDC

Error:

 > Model input: hello
 > Speaker Idx:
 > Language Idx:
 > Text splitted to sentences.
['hello']
ERROR:server:Exception on /api/tts [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/root/TTS/server/server.py", line 193, in tts
    wavs = synthesizer.tts(text, speaker_name=speaker_idx, language_name=language_idx, style_wav=style_wav)
  File "/root/TTS/utils/synthesizer.py", line 278, in tts
    outputs = synthesis(
  File "/root/TTS/tts/utils/synthesis.py", line 213, in synthesis
    outputs = run_model_torch(
  File "/root/TTS/tts/utils/synthesis.py", line 50, in run_model_torch
    outputs = _func(
  File "/usr/local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/root/TTS/tts/models/tacotron2.py", line 249, in inference
    encoder_outputs = self.encoder.inference(embedded_inputs)
  File "/root/TTS/tts/layers/tacotron/tacotron2.py", line 108, in inference
    o = layer(o)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/TTS/tts/layers/tacotron/tacotron2.py", line 40, in forward
    o = self.convolution1d(x)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 313, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Calculated padded input size per channel: (4). Kernel size: (5). Kernel size can't be greater than actual input size
INFO:werkzeug:::ffff:172.17.0.1 - - [10/Feb/2023 08:11:11] "GET /api/tts?text=hello HTTP/1.1" 500 -

To Reproduce

Use the latest main docker image (6cfb590eb21b1adf63a4c01c321452e2a4ee2093)

launch python3 TTS/server/server.py --model_name tts_models/fr/mai/tacotron2-DDC

go to localhost:5002

enter any text

Expected behavior

I expect to get an audio, not an error

Logs

> Model input: hello
 > Speaker Idx:
 > Language Idx:
 > Text splitted to sentences.
['hello']
ERROR:server:Exception on /api/tts [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 2525, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1822, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1820, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1796, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "/root/TTS/server/server.py", line 193, in tts
    wavs = synthesizer.tts(text, speaker_name=speaker_idx, language_name=language_idx, style_wav=style_wav)
  File "/root/TTS/utils/synthesizer.py", line 278, in tts
    outputs = synthesis(
  File "/root/TTS/tts/utils/synthesis.py", line 213, in synthesis
    outputs = run_model_torch(
  File "/root/TTS/tts/utils/synthesis.py", line 50, in run_model_torch
    outputs = _func(
  File "/usr/local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/root/TTS/tts/models/tacotron2.py", line 249, in inference
    encoder_outputs = self.encoder.inference(embedded_inputs)
  File "/root/TTS/tts/layers/tacotron/tacotron2.py", line 108, in inference
    o = layer(o)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/root/TTS/tts/layers/tacotron/tacotron2.py", line 40, in forward
    o = self.convolution1d(x)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 313, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Calculated padded input size per channel: (4). Kernel size: (5). Kernel size can't be greater than actual input size
INFO:werkzeug:::ffff:172.17.0.1 - - [10/Feb/2023 08:11:11] "GET /api/tts?text=hello HTTP/1.1" 500 -

Environment

Using image tagged (6cfb590eb21b1adf63a4c01c321452e2a4ee2093) with built-in commands running on Windows 10 host

Additional context

No response

erogol commented 1 year ago

The model does not work too short inputs. You can also try adding punctuation at the end. It is not a code issue but more about the model architecture.

qwertyuu commented 1 year ago

Perfect. We could pivot this PR as a need to validate input to get a better understanding of the error. I feel like the current error indicates a problem with coqui-ai and not the format of the input!

erogol commented 1 year ago

Sure we can keep the issue open. It is not something I'd immediately deal with but I hope someone else is open to fixing it with a PR.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

planetMatrix commented 10 months ago

Hello there! I have the same issue and found something similar

https://github.com/SeanNaren/deepspeech.pytorch/issues/362

Could you please share the exact location to change the padding value.

It would be early very helpful.

Many thanks

Amith5970 commented 8 months ago

can you please tell how the issue is resolved?

Amith5970 commented 8 months ago

Hello there! I have the same issue and found something similar

SeanNaren/deepspeech.pytorch#362

Could you please share the exact location to change the padding value.

It would be early very helpful.

Many thanks

did you got any reply

ytlviv commented 3 weeks ago

Does this model not support Chinese?Input English is OK, input Chinese will report this error