coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.29k stars 4.31k forks source link

After upgrade from 0.17.5 to latest facing that issue #3037

Closed nto4 closed 1 year ago

nto4 commented 1 year ago

Describe the bug

im upgrade TTS package TTS==0.17.5 to latest im facing this error what is problm here ?

cleancoquio_xtts_venv) (base) root@DESKTOP-RUI9N9R:/coquio_ttsx_vish_coqui#  cd /coquio_ttsx_vish_coqui ; /usr/bin/env /coquio_ttsx_vish_coqui/cleancoquio_xtts_venv/bin/python /root/.vscode-server/extensions/ms-python.python-2023.16.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher 46633 -- /coquio_ttsx_vish_coqui/code/generate_audio.py 
 > tts_models/multilingual/multi-dataset/xtts_v1 is already downloaded.
 > Using model: xtts
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/.vscode-server/extensions/ms-python.python-2023.16.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/root/.vscode-server/extensions/ms-python.python-2023.16.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/root/.vscode-server/extensions/ms-python.python-2023.16.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/root/.vscode-server/extensions/ms-python.python-2023.16.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/root/.vscode-server/extensions/ms-python.python-2023.16.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/root/.vscode-server/extensions/ms-python.python-2023.16.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/coquio_ttsx_vish_coqui/code/generate_audio.py", line 35, in <module>
    create_audio_subproc(text, file_path, speaker_wav, language)
  File "/coquio_ttsx_vish_coqui/code/generate_audio.py", line 12, in create_audio_subproc
    tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1")
  File "/coquio_ttsx_vish_coqui/cleancoquio_xtts_venv/lib/python3.10/site-packages/TTS/api.py", line 81, in __init__
    self.load_tts_model_by_name(model_name, gpu)
  File "/coquio_ttsx_vish_coqui/cleancoquio_xtts_venv/lib/python3.10/site-packages/TTS/api.py", line 186, in load_tts_model_by_name
    self.synthesizer = Synthesizer(
  File "/coquio_ttsx_vish_coqui/cleancoquio_xtts_venv/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 109, in __init__
    self._load_tts_from_dir(model_dir, use_cuda)
  File "/coquio_ttsx_vish_coqui/cleancoquio_xtts_venv/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 164, in _load_tts_from_dir
    self.tts_model.load_checkpoint(config, checkpoint_dir=model_dir, eval=True)
  File "/coquio_ttsx_vish_coqui/cleancoquio_xtts_venv/lib/python3.10/site-packages/TTS/tts/models/xtts.py", line 801, in load_checkpoint
    self.load_state_dict(checkpoint, strict=strict)
  File "/coquio_ttsx_vish_coqui/cleancoquio_xtts_venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Xtts:
        Missing key(s) in state_dict: "hifigan_decoder.waveform_decoder.conv_pre.bias", "hifigan_decoder.waveform_decoder.conv_pre.weight", "hifigan_decoder.waveform_decoder.ups.0.bias", "hifigan_decoder.waveform_decoder.ups.0.weight_g", "hifigan_decoder.waveform_decoder.ups.0.weight_v", "hifigan_decoder.waveform_decoder.ups.1.bias", "hifigan_decoder.waveform_decoder.ups.1.weight_g", "hifigan_decoder.waveform_decoder.ups.1.weight_v", "hifigan_decoder.waveform_decoder.ups.2.bias", "hifigan_decoder.waveform_decoder.ups.2.weight_g", "hifigan_decoder.waveform_decoder.ups.2.weight_v", "hifigan_decoder.waveform_decoder.ups.3.bias", "hifigan_decoder.waveform_decoder.ups.3.weight_g", "hifigan_decoder.waveform_decoder.ups.3.weight_v", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.0.bias", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.1.bias", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.2.bias", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.0.bias", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.1.bias", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.2.bias", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.0.bias", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.1.bias", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.2.bias", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.0.bias", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.1.bias", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.2.bias", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.0.bias", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.1.bias", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.2.bias", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.0.bias", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.1.bias", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.2.bias", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.0.bias", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.1.bias", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.2.bias", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.0.bias", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.1.bias", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.2.bias", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.0.bias", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.1.bias", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.2.bias", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.0.bias", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.1.bias", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.2.bias", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.0.bias", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.1.bias", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.2.bias", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.0.bias", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.1.bias", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.2.bias", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.0.bias", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.1.bias", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.2.bias", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.0.bias", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.1.bias", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.2.bias", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.0.bias", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.1.bias", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.2.bias", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.0.bias", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.1.bias", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.2.bias", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.0.bias", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.1.bias", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.2.bias", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.0.bias", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.1.bias", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.2.bias", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.0.bias", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.1.bias", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.2.bias", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.0.bias", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.1.bias", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.2.bias", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.0.bias", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.1.bias", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.2.bias", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.0.bias", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.1.bias", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.2.bias", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.0.bias", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.1.bias", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.2.bias", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.2.weight_v", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.0.bias", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.0.weight_g", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.0.weight_v", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.1.bias", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.1.weight_g", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.1.weight_v", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.2.bias", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.2.weight_g", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.2.weight_v", "hifigan_decoder.waveform_decoder.conv_post.weight", "hifigan_decoder.waveform_decoder.cond_layer.weight", "hifigan_decoder.waveform_decoder.cond_layer.bias", "hifigan_decoder.waveform_decoder.conds.0.weight", "hifigan_decoder.waveform_decoder.conds.0.bias", "hifigan_decoder.waveform_decoder.conds.1.weight", "hifigan_decoder.waveform_decoder.conds.1.bias", "hifigan_decoder.waveform_decoder.conds.2.weight", "hifigan_decoder.waveform_decoder.conds.2.bias", "hifigan_decoder.waveform_decoder.conds.3.weight", "hifigan_decoder.waveform_decoder.conds.3.bias", "hifigan_decoder.speaker_encoder.conv1.weight", "hifigan_decoder.speaker_encoder.conv1.bias", "hifigan_decoder.speaker_encoder.bn1.weight", "hifigan_decoder.speaker_encoder.bn1.bias", "hifigan_decoder.speaker_encoder.bn1.running_mean", "hifigan_decoder.speaker_encoder.bn1.running_var", "hifigan_decoder.speaker_encoder.layer1.0.conv1.weight", "hifigan_decoder.speaker_encoder.layer1.0.bn1.weight", "hifigan_decoder.speaker_encoder.layer1.0.bn1.bias", "hifigan_decoder.speaker_encoder.layer1.0.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer1.0.bn1.running_var", "hifigan_decoder.speaker_encoder.layer1.0.conv2.weight", "hifigan_decoder.speaker_encoder.layer1.0.bn2.weight", "hifigan_decoder.speaker_encoder.layer1.0.bn2.bias", "hifigan_decoder.speaker_encoder.layer1.0.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer1.0.bn2.running_var", "hifigan_decoder.speaker_encoder.layer1.0.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer1.0.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer1.0.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer1.0.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer1.1.conv1.weight", "hifigan_decoder.speaker_encoder.layer1.1.bn1.weight", "hifigan_decoder.speaker_encoder.layer1.1.bn1.bias", "hifigan_decoder.speaker_encoder.layer1.1.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer1.1.bn1.running_var", "hifigan_decoder.speaker_encoder.layer1.1.conv2.weight", "hifigan_decoder.speaker_encoder.layer1.1.bn2.weight", "hifigan_decoder.speaker_encoder.layer1.1.bn2.bias", "hifigan_decoder.speaker_encoder.layer1.1.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer1.1.bn2.running_var", "hifigan_decoder.speaker_encoder.layer1.1.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer1.1.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer1.1.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer1.1.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer1.2.conv1.weight", "hifigan_decoder.speaker_encoder.layer1.2.bn1.weight", "hifigan_decoder.speaker_encoder.layer1.2.bn1.bias", "hifigan_decoder.speaker_encoder.layer1.2.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer1.2.bn1.running_var", "hifigan_decoder.speaker_encoder.layer1.2.conv2.weight", "hifigan_decoder.speaker_encoder.layer1.2.bn2.weight", "hifigan_decoder.speaker_encoder.layer1.2.bn2.bias", "hifigan_decoder.speaker_encoder.layer1.2.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer1.2.bn2.running_var", "hifigan_decoder.speaker_encoder.layer1.2.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer1.2.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer1.2.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer1.2.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer2.0.conv1.weight", "hifigan_decoder.speaker_encoder.layer2.0.bn1.weight", "hifigan_decoder.speaker_encoder.layer2.0.bn1.bias", "hifigan_decoder.speaker_encoder.layer2.0.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer2.0.bn1.running_var", "hifigan_decoder.speaker_encoder.layer2.0.conv2.weight", "hifigan_decoder.speaker_encoder.layer2.0.bn2.weight", "hifigan_decoder.speaker_encoder.layer2.0.bn2.bias", "hifigan_decoder.speaker_encoder.layer2.0.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer2.0.bn2.running_var", "hifigan_decoder.speaker_encoder.layer2.0.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer2.0.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer2.0.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer2.0.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer2.0.downsample.0.weight", "hifigan_decoder.speaker_encoder.layer2.0.downsample.1.weight", "hifigan_decoder.speaker_encoder.layer2.0.downsample.1.bias", "hifigan_decoder.speaker_encoder.layer2.0.downsample.1.running_mean", "hifigan_decoder.speaker_encoder.layer2.0.downsample.1.running_var", "hifigan_decoder.speaker_encoder.layer2.1.conv1.weight", "hifigan_decoder.speaker_encoder.layer2.1.bn1.weight", "hifigan_decoder.speaker_encoder.layer2.1.bn1.bias", "hifigan_decoder.speaker_encoder.layer2.1.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer2.1.bn1.running_var", "hifigan_decoder.speaker_encoder.layer2.1.conv2.weight", "hifigan_decoder.speaker_encoder.layer2.1.bn2.weight", "hifigan_decoder.speaker_encoder.layer2.1.bn2.bias", "hifigan_decoder.speaker_encoder.layer2.1.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer2.1.bn2.running_var", "hifigan_decoder.speaker_encoder.layer2.1.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer2.1.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer2.1.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer2.1.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer2.2.conv1.weight", "hifigan_decoder.speaker_encoder.layer2.2.bn1.weight", "hifigan_decoder.speaker_encoder.layer2.2.bn1.bias", "hifigan_decoder.speaker_encoder.layer2.2.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer2.2.bn1.running_var", "hifigan_decoder.speaker_encoder.layer2.2.conv2.weight", "hifigan_decoder.speaker_encoder.layer2.2.bn2.weight", "hifigan_decoder.speaker_encoder.layer2.2.bn2.bias", "hifigan_decoder.speaker_encoder.layer2.2.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer2.2.bn2.running_var", "hifigan_decoder.speaker_encoder.layer2.2.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer2.2.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer2.2.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer2.2.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer2.3.conv1.weight", "hifigan_decoder.speaker_encoder.layer2.3.bn1.weight", "hifigan_decoder.speaker_encoder.layer2.3.bn1.bias", "hifigan_decoder.speaker_encoder.layer2.3.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer2.3.bn1.running_var", "hifigan_decoder.speaker_encoder.layer2.3.conv2.weight", "hifigan_decoder.speaker_encoder.layer2.3.bn2.weight", "hifigan_decoder.speaker_encoder.layer2.3.bn2.bias", "hifigan_decoder.speaker_encoder.layer2.3.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer2.3.bn2.running_var", "hifigan_decoder.speaker_encoder.layer2.3.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer2.3.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer2.3.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer2.3.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer3.0.conv1.weight", "hifigan_decoder.speaker_encoder.layer3.0.bn1.weight", "hifigan_decoder.speaker_encoder.layer3.0.bn1.bias", "hifigan_decoder.speaker_encoder.layer3.0.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer3.0.bn1.running_var", "hifigan_decoder.speaker_encoder.layer3.0.conv2.weight", "hifigan_decoder.speaker_encoder.layer3.0.bn2.weight", "hifigan_decoder.speaker_encoder.layer3.0.bn2.bias", "hifigan_decoder.speaker_encoder.layer3.0.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer3.0.bn2.running_var", "hifigan_decoder.speaker_encoder.layer3.0.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer3.0.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer3.0.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer3.0.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer3.0.downsample.0.weight", "hifigan_decoder.speaker_encoder.layer3.0.downsample.1.weight", "hifigan_decoder.speaker_encoder.layer3.0.downsample.1.bias", "hifigan_decoder.speaker_encoder.layer3.0.downsample.1.running_mean", "hifigan_decoder.speaker_encoder.layer3.0.downsample.1.running_var", "hifigan_decoder.speaker_encoder.layer3.1.conv1.weight", "hifigan_decoder.speaker_encoder.layer3.1.bn1.weight", "hifigan_decoder.speaker_encoder.layer3.1.bn1.bias", "hifigan_decoder.speaker_encoder.layer3.1.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer3.1.bn1.running_var", "hifigan_decoder.speaker_encoder.layer3.1.conv2.weight", "hifigan_decoder.speaker_encoder.layer3.1.bn2.weight", "hifigan_decoder.speaker_encoder.layer3.1.bn2.bias", "hifigan_decoder.speaker_encoder.layer3.1.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer3.1.bn2.running_var", "hifigan_decoder.speaker_encoder.layer3.1.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer3.1.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer3.1.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer3.1.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer3.2.conv1.weight", "hifigan_decoder.speaker_encoder.layer3.2.bn1.weight", "hifigan_decoder.speaker_encoder.layer3.2.bn1.bias", "hifigan_decoder.speaker_encoder.layer3.2.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer3.2.bn1.running_var", "hifigan_decoder.speaker_encoder.layer3.2.conv2.weight", "hifigan_decoder.speaker_encoder.layer3.2.bn2.weight", "hifigan_decoder.speaker_encoder.layer3.2.bn2.bias", "hifigan_decoder.speaker_encoder.layer3.2.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer3.2.bn2.running_var", "hifigan_decoder.speaker_encoder.layer3.2.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer3.2.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer3.2.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer3.2.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer3.3.conv1.weight", "hifigan_decoder.speaker_encoder.layer3.3.bn1.weight", "hifigan_decoder.speaker_encoder.layer3.3.bn1.bias", "hifigan_decoder.speaker_encoder.layer3.3.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer3.3.bn1.running_var", "hifigan_decoder.speaker_encoder.layer3.3.conv2.weight", "hifigan_decoder.speaker_encoder.layer3.3.bn2.weight", "hifigan_decoder.speaker_encoder.layer3.3.bn2.bias", "hifigan_decoder.speaker_encoder.layer3.3.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer3.3.bn2.running_var", "hifigan_decoder.speaker_encoder.layer3.3.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer3.3.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer3.3.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer3.3.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer3.4.conv1.weight", "hifigan_decoder.speaker_encoder.layer3.4.bn1.weight", "hifigan_decoder.speaker_encoder.layer3.4.bn1.bias", "hifigan_decoder.speaker_encoder.layer3.4.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer3.4.bn1.running_var", "hifigan_decoder.speaker_encoder.layer3.4.conv2.weight", "hifigan_decoder.speaker_encoder.layer3.4.bn2.weight", "hifigan_decoder.speaker_encoder.layer3.4.bn2.bias", "hifigan_decoder.speaker_encoder.layer3.4.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer3.4.bn2.running_var", "hifigan_decoder.speaker_encoder.layer3.4.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer3.4.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer3.4.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer3.4.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer3.5.conv1.weight", "hifigan_decoder.speaker_encoder.layer3.5.bn1.weight", "hifigan_decoder.speaker_encoder.layer3.5.bn1.bias", "hifigan_decoder.speaker_encoder.layer3.5.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer3.5.bn1.running_var", "hifigan_decoder.speaker_encoder.layer3.5.conv2.weight", "hifigan_decoder.speaker_encoder.layer3.5.bn2.weight", "hifigan_decoder.speaker_encoder.layer3.5.bn2.bias", "hifigan_decoder.speaker_encoder.layer3.5.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer3.5.bn2.running_var", "hifigan_decoder.speaker_encoder.layer3.5.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer3.5.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer3.5.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer3.5.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer4.0.conv1.weight", "hifigan_decoder.speaker_encoder.layer4.0.bn1.weight", "hifigan_decoder.speaker_encoder.layer4.0.bn1.bias", "hifigan_decoder.speaker_encoder.layer4.0.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer4.0.bn1.running_var", "hifigan_decoder.speaker_encoder.layer4.0.conv2.weight", "hifigan_decoder.speaker_encoder.layer4.0.bn2.weight", "hifigan_decoder.speaker_encoder.layer4.0.bn2.bias", "hifigan_decoder.speaker_encoder.layer4.0.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer4.0.bn2.running_var", "hifigan_decoder.speaker_encoder.layer4.0.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer4.0.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer4.0.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer4.0.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer4.0.downsample.0.weight", "hifigan_decoder.speaker_encoder.layer4.0.downsample.1.weight", "hifigan_decoder.speaker_encoder.layer4.0.downsample.1.bias", "hifigan_decoder.speaker_encoder.layer4.0.downsample.1.running_mean", "hifigan_decoder.speaker_encoder.layer4.0.downsample.1.running_var", "hifigan_decoder.speaker_encoder.layer4.1.conv1.weight", "hifigan_decoder.speaker_encoder.layer4.1.bn1.weight", "hifigan_decoder.speaker_encoder.layer4.1.bn1.bias", "hifigan_decoder.speaker_encoder.layer4.1.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer4.1.bn1.running_var", "hifigan_decoder.speaker_encoder.layer4.1.conv2.weight", "hifigan_decoder.speaker_encoder.layer4.1.bn2.weight", "hifigan_decoder.speaker_encoder.layer4.1.bn2.bias", "hifigan_decoder.speaker_encoder.layer4.1.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer4.1.bn2.running_var", "hifigan_decoder.speaker_encoder.layer4.1.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer4.1.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer4.1.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer4.1.se.fc.2.bias", "hifigan_decoder.speaker_encoder.layer4.2.conv1.weight", "hifigan_decoder.speaker_encoder.layer4.2.bn1.weight", "hifigan_decoder.speaker_encoder.layer4.2.bn1.bias", "hifigan_decoder.speaker_encoder.layer4.2.bn1.running_mean", "hifigan_decoder.speaker_encoder.layer4.2.bn1.running_var", "hifigan_decoder.speaker_encoder.layer4.2.conv2.weight", "hifigan_decoder.speaker_encoder.layer4.2.bn2.weight", "hifigan_decoder.speaker_encoder.layer4.2.bn2.bias", "hifigan_decoder.speaker_encoder.layer4.2.bn2.running_mean", "hifigan_decoder.speaker_encoder.layer4.2.bn2.running_var", "hifigan_decoder.speaker_encoder.layer4.2.se.fc.0.weight", "hifigan_decoder.speaker_encoder.layer4.2.se.fc.0.bias", "hifigan_decoder.speaker_encoder.layer4.2.se.fc.2.weight", "hifigan_decoder.speaker_encoder.layer4.2.se.fc.2.bias", "hifigan_decoder.speaker_encoder.torch_spec.0.filter", "hifigan_decoder.speaker_encoder.torch_spec.1.spectrogram.window", "hifigan_decoder.speaker_encoder.torch_spec.1.mel_scale.fb", "hifigan_decoder.speaker_encoder.attention.0.weight", "hifigan_decoder.speaker_encoder.attention.0.bias", "hifigan_decoder.speaker_encoder.attention.2.weight", "hifigan_decoder.speaker_encoder.attention.2.bias", "hifigan_decoder.speaker_encoder.attention.2.running_mean", "hifigan_decoder.speaker_encoder.attention.2.running_var", "hifigan_decoder.speaker_encoder.attention.3.weight", "hifigan_decoder.speaker_encoder.attention.3.bias", "hifigan_decoder.speaker_encoder.fc.weight", "hifigan_decoder.speaker_encoder.fc.bias". 

To Reproduce

from TTS.api import TTS
import sys
import logging
logging.basicConfig(filename='c', level=logging.ERROR)
logger = logging.getLogger('generate_audio_xtts_logger')
logger.setLevel(logging.ERROR)

# from TTS.api import TTS

def create_audio_subproc(text, file_path, speaker_wav, language):

    tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1")
    tts.to("cuda")
    tts.tts_to_file(text=text, file_path=file_path, speaker_wav=speaker_wav, language=language)

create_audio_subproc(text, file_path, speaker_wav, language)

feed text filepaht and speaker wav and langugae like this


    text = "Test sentence test one two three" 
    file_path = "/coquio_ttsx_vish_coqui/code/process_dir/test.wav" 
    speaker_wav = "/coquio_ttsx_vish_coqui/code/voices/michael/Michael_2.wav" 
    language = "en"
    create_audio_subproc(text, file_path, speaker_wav, language)

Expected behavior

No response

Logs

No response

Environment

cleancoquio_xtts_venv) (base) root@DESKTOP-RUI9N9R:/coquio_ttsx_vish_coqui# python collect_env_info.py
{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 4070"
        ],
        "available": true,
        "version": "11.7"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.0.1+cu117",
        "TTS": "0.17.7",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.12",
        "version": "#1 SMP Fri Jan 27 02:56:13 UTC 2023"
    }
}

Additional context

No response

gorkemgoknar commented 1 year ago

Check if you have model.pth config and vocab are downloaded in ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v1 (linux)

This will show exact model path

from TTS.utils.generic_utils import get_user_data_dir
model_path = os.path.join(get_user_data_dir("tts"), "tts_models--multilingual--multi-dataset--xtts_v1")
print(model_path)

if folder is empty but exists, remove the tts_models--multilingual--multi-dataset--xtts_v1

try redownloading model via cli (after accepting terms) or if you use code make sure you agree Terms of service before importing TTS

import os
# By using XTTS you agree to CPML license https://coqui.ai/cpml
os.environ["COQUI_TOS_AGREED"] = "1"

# After you agree import TTS to trigger download
from TTS.api import TTS
# Other XTTS model stuff
nto4 commented 1 year ago

Deleting weight solve my problem