Closed caffeinetoomuch closed 10 months ago
I am encountering the same problem.
Same
+1
Delete tts_models--multilingual--multi-dataset--xtts_v2 folder and let the model download again. Fixed the issue for me.
redownload the model can fix the problem,but i have found the wav's quality of this method is worse than the method of using api,don't know why
I have checked how TTS api is loading the same exact model and it's different from code example from documentation: https://github.com/coqui-ai/TTS/blob/dev/docs/source/models/xtts.md
This is how i managed to load this model without errors:
from pathlib import Path
from TTS.tts.models import setup_model as setup_tts_model
from TTS.config import load_config
model_dir = Path("/home/user/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2")
config = load_config(model_dir / "config.json")
model = setup_tts_model(config)
model.load_checkpoint(config,
checkpoint_dir=model_dir,
eval=True,
# use_deepspeed=True,
)
model.to("cuda")
@Aya-AlJafari can you check the code above? It should have worked.
redownload the model can fix the problem,but i have found the wav's quality of this method is worse than the method of using api,don't know why
This issue happens because the loaded model is not using the decoding parameters that are on config.json. You need to manually set them. Example:
gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(audio_path=audio_path,gpt_cond_len=model.config.gpt_cond_len, max_ref_length=model.config.max_ref_len, sound_norm_refs=model.config.sound_norm_refs)
out = model.inference(
text=tts_text,
language=lang,
gpt_cond_latent=gpt_cond_latent,
speaker_embedding=speaker_embedding,
temperature=model.config.temperature, # Add custom parameters here
length_penalty=model.config.length_penalty,
repetition_penalty=model.config.repetition_penalty,
top_k=model.config.top_k,
top_p=model.config.top_p,
)
I think the issue was caused by my mistake of not providing the right config file. After loading the config.json
from fine-tuning checkpoint directory, the loading seems to be working now for both v1.1 and v2. Before closing the issue, I have the other questions regarding fine-tuning @Edresson
@erogol checked and it works.
The above code fragment and my use started to fail with this. Was working fine until just last hour or so.
Unexpected key(s) in state_dict: "hifigan_decoder.waveform_decoder.ups.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.ups.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.ups.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.ups.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.ups.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.ups.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.ups.3.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.ups.3.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.0.convs1.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.0.convs2.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.1.convs1.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.1.convs2.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.2.convs1.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.2.convs2.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.3.convs1.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.3.convs2.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.4.convs1.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.4.convs2.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.5.convs1.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.5.convs2.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.6.convs1.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.6.convs2.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.7.convs1.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.7.convs2.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.8.convs1.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.8.convs2.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.9.convs1.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.9.convs2.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.10.convs1.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.10.convs2.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.11.convs1.2.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.0.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.0.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.1.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.1.parametrizations.weight.original1", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.2.parametrizations.weight.original0", "hifigan_decoder.waveform_decoder.resblocks.11.convs2.2.parametrizations.weight.original1".
>>>
I'm using Python 3.11 and encountering the same problem. I've just reinstalled all necessary packages according to the following versions:
tensorflow==2.17.0rc0 transformers==4.41.2 triton==2.3.1 TTS==0.22.0 torch==2.3.1 torchaudio==2.3.1 tokenizers==0.19.1 tensorboard==2.17.0 tensorboard-data-server==0.7.2 huggingface-hub==0.23.4 scikit-learn==1.5.0
Describe the bug
When loading the model using
Xtts.load_checkpoint
, exception is raised asError(s) in loading state_dict for Xtts
, which leads to missing keys GPT embedding weights and size mismatch on Mel embedding. Even tried providing the directory which had base(v2) model checkpoints and got the same result.To Reproduce
Expected behavior
Load the checkpoint and run inference without exception.
Logs
Environment
Additional context
No response