Synthesizing a prevously trained YourTTS model does not work

fungus75 commented 1 year ago

Describe the bug

I'm currently playing around with the YourTTS-Model (Based on https://github.com/coqui-ai/TTS/blob/dev/recipes/vctk/yourtts/train_yourtts.py) Training worked well. Tensorboard also could generate the Voice-Outputs from my sample Texts.

But I'm not able to use the generated model in a standalone script. Always get an error:

Traceback (most recent call last): File "/disk1/daten/voice/yourtts/Thorsten_sr16000-DE-YourTTS-Training-March-02-2023_10+05PM-0000000/dotext.py", line 5, in wav=s.tts("Hallo ich bin Eric wie geht es euch") File "/home/rene/MachineLearning/Voice/TTS/TTS/utils/synthesizer.py", line 278, in tts outputs = synthesis( File "/home/rene/MachineLearning/Voice/TTS/TTS/tts/utils/synthesis.py", line 213, in synthesis outputs = run_model_torch( File "/home/rene/MachineLearning/Voice/TTS/TTS/tts/utils/synthesis.py", line 50, in run_model_torch outputs = _func( File "/home/rene/anaconda3/envs/voice2/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/home/rene/MachineLearning/Voice/TTS/TTS/tts/models/vits.py", line 1159, in inference o = self.waveform_decoder((z y_mask)[:, :, : self.max_inference_len], g=g) File "/home/rene/anaconda3/envs/voice2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/rene/MachineLearning/Voice/TTS/TTS/vocoder/models/hifigan_generator.py", line 250, in forward o = o + self.cond_layer(g) File "/home/rene/anaconda3/envs/voice2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/rene/anaconda3/envs/voice2/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 313, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/rene/anaconda3/envs/voice2/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward return F.conv1d(input, weight, bias, self.stride, TypeError: conv1d() received an invalid combination of arguments - got (NoneType, Parameter, Parameter, tuple, tuple, tuple, int), but expected one of:

(Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups) didn't match because some of the arguments have invalid types: (NoneType, Parameter, Parameter, tuple, tuple, tuple, int)
(Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups) didn't match because some of the arguments have invalid types: (NoneType, Parameter, Parameter, tuple, tuple, tuple, int)

To Reproduce

I used different variants, but always the same error.

This script was used: from TTS.utils.synthesizer import Synthesizer MODEL_PATH="best_model.pth" CONFIG_PATH="config.json" OUT_PATH="." s = Synthesizer(MODEL_PATH,CONFIG_PATH,use_cuda=True) wav=s.tts("Hallo ich bin Eric wie geht es euch") s.save_wav(wav,os.path.join(OUT_PATH,"test.wav"))

I used the ThorstenVoice Dataset resampled down to 16000 sampling rate.

I used exactly the same environment for Training and Synthesizing. I used the Version from GitHub, checked out 1 month ago.

Expected behavior

Generate WAV-File without an dump.

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 3090",
            "NVIDIA GeForce RTX 3090",
            "NVIDIA GeForce RTX 3090",
            "NVIDIA GeForce RTX 3090",
            "NVIDIA GeForce RTX 3090",
            "NVIDIA GeForce RTX 3090"
        ],
        "available": true,
        "version": "11.7"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.13.1+cu117",
        "TTS": "0.10.2",
        "numpy": "1.22.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "",
        "python": "3.10.9",
        "version": "#1 SMP Debian 5.10.162-1 (2023-01-21)"
    }
}

Additional context

No response

fungus75 commented 1 year ago

If necessary, you can download the model (Checkpoint) and config.json from here: https://drive.google.com/drive/folders/1bU9ObB1Z30VoT5miTXEW2bDW1EODw-gr?usp=sharing

fungus75 commented 1 year ago

Found the bug myself: https://github.com/coqui-ai/TTS/pull/2393

erogol commented 1 year ago

YourTTS requires speaker embeddings. Provide speaker_wav for Synthesizer

coqui-ai / TTS