Closed fungus75 closed 1 year ago
If necessary, you can download the model (Checkpoint) and config.json from here: https://drive.google.com/drive/folders/1bU9ObB1Z30VoT5miTXEW2bDW1EODw-gr?usp=sharing
Found the bug myself: https://github.com/coqui-ai/TTS/pull/2393
YourTTS requires speaker embeddings. Provide speaker_wav
for Synthesizer
Describe the bug
I'm currently playing around with the YourTTS-Model (Based on https://github.com/coqui-ai/TTS/blob/dev/recipes/vctk/yourtts/train_yourtts.py) Training worked well. Tensorboard also could generate the Voice-Outputs from my sample Texts.
But I'm not able to use the generated model in a standalone script. Always get an error:
Traceback (most recent call last): File "/disk1/daten/voice/yourtts/Thorsten_sr16000-DE-YourTTS-Training-March-02-2023_10+05PM-0000000/dotext.py", line 5, in
wav=s.tts("Hallo ich bin Eric wie geht es euch")
File "/home/rene/MachineLearning/Voice/TTS/TTS/utils/synthesizer.py", line 278, in tts
outputs = synthesis(
File "/home/rene/MachineLearning/Voice/TTS/TTS/tts/utils/synthesis.py", line 213, in synthesis
outputs = run_model_torch(
File "/home/rene/MachineLearning/Voice/TTS/TTS/tts/utils/synthesis.py", line 50, in run_model_torch
outputs = _func(
File "/home/rene/anaconda3/envs/voice2/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, kwargs)
File "/home/rene/MachineLearning/Voice/TTS/TTS/tts/models/vits.py", line 1159, in inference
o = self.waveform_decoder((z y_mask)[:, :, : self.max_inference_len], g=g)
File "/home/rene/anaconda3/envs/voice2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(input, kwargs)
File "/home/rene/MachineLearning/Voice/TTS/TTS/vocoder/models/hifigan_generator.py", line 250, in forward
o = o + self.cond_layer(g)
File "/home/rene/anaconda3/envs/voice2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/rene/anaconda3/envs/voice2/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/rene/anaconda3/envs/voice2/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
TypeError: conv1d() received an invalid combination of arguments - got (NoneType, Parameter, Parameter, tuple, tuple, tuple, int), but expected one of:
To Reproduce
I used different variants, but always the same error.
This script was used: from TTS.utils.synthesizer import Synthesizer MODEL_PATH="best_model.pth" CONFIG_PATH="config.json" OUT_PATH="." s = Synthesizer(MODEL_PATH,CONFIG_PATH,use_cuda=True) wav=s.tts("Hallo ich bin Eric wie geht es euch") s.save_wav(wav,os.path.join(OUT_PATH,"test.wav"))
I used the ThorstenVoice Dataset resampled down to 16000 sampling rate.
I used exactly the same environment for Training and Synthesizing. I used the Version from GitHub, checked out 1 month ago.
Expected behavior
Generate WAV-File without an dump.
Logs
No response
Environment
Additional context
No response