Open jp-x-g opened 8 months ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
Describe the bug
Something in the way that it passes input into the speedy_speech model (tts_models/en/ljspeech/speedy-speech) is bugged and errors out for short inputs. It wants them to be a specific length. I've only tested this for single-sentence input, I don't know what it does for other types of input.
To Reproduce
This returns a truly giant stacktrace, terminating in
/.local/lib/python3.11/site-packages/torch/nn/modules/conv.py
, which gives:RuntimeError: Calculated padded input size per channel: (4). Kernel size: (7). Kernel size can't be greater than actual input size
Changing the input string to "Testino" (7 characters) or "Testy westy" (11 characters) gives the same stacktrace:
RuntimeError: Calculated padded input size per channel: (11). Kernel size: (13). Kernel size can't be greater than actual input size
Adding two characters fixes this, but they can't be whitespaceTesty westy
(two extra spaces at the end): no.Testy westy
(two extra spaces in the middle): no.Testy a-westy
: yes.Testy westy..
(two periods): no, because it's parsed as "12 characters".Testy westy...
(three periods, making it fourteen characters instead of thirteen) does.Expected behavior
I am not sure what the optimal method of fixing this is. What I'd do, without any knowledge of what's going on under the hood, is just figure out what length of string it wants (it seems like 13 is the minimum) and just pad out all short speedy_speech inputs with ellipses to get to 13. This is probably a bad idea, and there's probably a better way of doing it.
Logs
No response
Environment
Additional context
No response