Closed angelhd1999 closed 1 year ago
Hi, here this model is working using the dev branch.
Please install the :frog: TTS from the dev branch using the command: pip install git+https://github.com/coqui-ai/TTS.git
and try again.
For me, the command tts --model_name tts_models/es/mai/tacotron2-DDC --text "Los flamencos son aves gregarias altamente especializadas"
generates the following audio:
Hello, thank you for your feedback. I'm getting the next error now. Log:
> Text splitted to sentences.
['Los flamencos son aves gregarias altamente especializadas']
Traceback (most recent call last):
File "C:\...\.conda\envs\aivt\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\...\.conda\envs\aivt\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\...\.conda\envs\aivt\Scripts\tts.exe\__main__.py", line 7, in <module>
File "C:\...\TTS\bin\synthesize.py", line 357, in main
wav = synthesizer.tts(
File "C:\...\TTS\utils\synthesizer.py", line 278, in tts
outputs = synthesis(
File "C:\...\TTS\tts\utils\synthesis.py", line 213, in synthesis
outputs = run_model_torch(
File "C:\...\TTS\tts\utils\synthesis.py", line 50, in run_model_torch
outputs = _func(
File "C:\...\.conda\envs\aivt\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\...\TTS\tts\models\tacotron2.py", line 249, in inference
encoder_outputs = self.encoder.inference(embedded_inputs)
File "C:\...\TTS\tts\layers\tacotron\tacotron2.py", line 108, in inference
o = layer(o)
File "C:\...\.conda\envs\aivt\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\...\TTS\tts\layers\tacotron\tacotron2.py", line 40, in forward
o = self.convolution1d(x)
File "C:\...\.conda\envs\aivt\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\...\.conda\envs\aivt\lib\site-packages\torch\nn\modules\conv.py", line 263, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\...\.conda\envs\aivt\lib\site-packages\torch\nn\modules\conv.py", line 259, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Calculated padded input size per channel: (4). Kernel size: (5). Kernel size can't be greater than actual input size
\...\
stands for (what I think) are irrelevant parts of the path.
Can't replicate it so I am closing it.
Try using punctuation at the end for the problem above.
I could reproduce this on arch linux. I tried first just using pip install tts, but kept running into the issue of empty speech files when trying to generate spanish.
So I tried the proposed fix: pip install git+https://github.com/coqui-ai/TTS.git
I tried both spanish models available.
Using
tts --model_name tts_models/es/mai/tacotron2-DDC --text "Los flamencos son aves gregarias altamente especializadas"
resulted in:
"RuntimeError: Calculated padded input size per channel: (4). Kernel size: (5). Kernel size can't be greater than actual input size"
Adding punctuation (one dot at the end) allows the program to go through this and finish running, but the generated .wav is still empty.
Using
tts --model_name tts_models/es/css10/vits --text "Los flamencos son aves gregarias altamente especializadas"
A correct output is generated.
I'd really like to be able to use tacotron as I like the voice better.
I tried again uninstalling and reinstalling, adressing the model directly, deleting and re-downloading the model, but the result is still the same. Without punctuation the model fails and with punctuation it produces an empty file as output.
@angelhd1999 Did you get it to work?
Edit: I read elsewhere here that the Tacotron2 model doesn't work too well with too short inputs, so I tried increasing the length of text. The output this time was not an empty 1 second .WAV, but rather a 2second .WAV with a loud shrill on it and nothing else. Nothing like the result @Edresson got.
using tts_models/es/mai/tacotron2-DDC
did not work for me either
using tts_models/es/mai/tacotron2-DDC did not work for me either
same.
same problem here
Hello, same issue for the tts_models/fr/mai/tacotron2-DDC
Hey guys @angelhd1999 @mudomau @aalvarado @deadprogram @jordicor @YA2JA I finally was able to reproduce the error.
It is an requirement issue. You need to install the gruut for the target languages (es or fr on your case). You can do it for FR and ES with the following command: pip install gruut-lang-es gruut-lang-fr
I have used a conda env with python 3.9.12. I have created the environment to test with the following commands:
conda create --name tts python=3.9
conda activate tts
pip install git+https://github.com/coqui-ai/TTS.git
# install gruut non english languages
pip install gruut-lang-cs gruut-lang-de gruut-lang-en gruut-lang-es gruut-lang-fr gruut-lang-it gruut-lang-nl gruut-lang-pt gruut-lang-ru gruut-lang-sv gruut-lang-ar gruut-lang-fa gruut-lang-sw
Then I run the command:
tts --model_name tts_models/es/mai/tacotron2-DDC --text "Los flamencos son aves gregarias altamente especializadas" --out_path tts-output-py39.wav
Then the output was:
Please let me know if it do not fixes the issue.
@erogol Should we added this packages on the requeriments to avoid this issue in future?
@Edresson that was the problem! following your steps works fine! thanks!!
@Edresson can you send a PR adding those to the requirements
Describe the bug
When using the model: tts_models/es/mai/tacotron2-DDC And using as example the phrase: "Los flamencos son aves gregarias altamente especializadas, que habitan sistemas salinos de donde obtienen su alimento." (But it happens with any phrase) A wrong audio of less than 1 second of duration is obtained:
https://user-images.githubusercontent.com/51427052/211700222-49bf2b87-711e-4bad-afd5-832cfceae30c.mp4
To Reproduce
I tried the three options I found on the documentation.
Running a single speaker model
Spanish Model 21: tts_models--es--mai--tacotron2-DDC ! Not working
Spanish Model 22: tts_models--es--css10--vits
model_name = TTS.list_models()[22]
Init TTS with the target model name
tts = TTS(model_name=model_name, progress_bar=False, gpu=False)
Run TTS
tts.tts_to_file(text="Los flamencos son aves gregarias altamente especializadas, que habitan sistemas salinos de donde obtienen su alimento.", file_path="test.wav")
tts --text "Los flamencos son aves gregarias altamente especializadas, que habitan sistemas salinos de donde obtienen su alimento." --model_name "tts_models/es/mai/tacotron2-DDC" --out_path testing.wav
docker run --rm -it -p 5002:5002 --entrypoint /bin/bash ghcr.io/coqui-ai/tts-cpu python TTS/server/server.py --model_name tts_models/es/mai/tacotron2-DDC
Environment
Additional context
If there's an option of using the male voice of tts_models/es/css10/vits and transform it into a female voice it could be also an interesting solution.