erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
1.02k stars 112 forks source link

Finetuned Model - Error: Model folder is missing required files or the folder does not exist. #387

Closed theslipperyCarrot closed 4 hours ago

theslipperyCarrot commented 4 hours ago

Hi! I´m testing around for a while but getting nothing generated till now with my own finetuned voices. At startup i get this errors:

[AllTalk TTS]←[94m     _    _ _ ←[1;35m_____     _ _     ←[0m  _____ _____ ____
[AllTalk TTS]←[94m    / \  | | |←[1;35m_   _|_ _| | | __ ←[0m |_   _|_   _/ ___|
[AllTalk TTS]←[94m   / _ \ | | |←[1;35m | |/ _` | | |/ / ←[0m   | |   | | \___ \
[AllTalk TTS]←[94m  / ___ \| | |←[1;35m | | (_| | |   <  ←[0m   | |   | |  ___) |
[AllTalk TTS]←[94m /_/   \_\_|_|←[1;35m |_|\__,_|_|_|\_\ ←[0m   |_|   |_| |____/
[AllTalk TTS]
[AllTalk TTS] ←[92mConfig file update: ←[93mNo Updates required←[0m
[AllTalk TTS] Start-up Mode     : Standalone mode
[AllTalk TTS] WAV file deletion : Disabled
[AllTalk TTS] Github updated    : 15th October 2024 at 23:27
[AllTalk ENG] Transcoding       : ffmpeg found
[AllTalk ENG] DeepSpeed version : 0.14.0+ce78a63
[AllTalk ENG] Python Version    : 3.11.0
[AllTalk ENG] PyTorch Version   : 2.2.1
[AllTalk ENG] CUDA Version      : 12.1
[AllTalk ENG]
[AllTalk ENG] Model/Engine : Piper Ready
[AllTalk ENG] Load time : 0.00 seconds.
[AllTalk TTS]
[AllTalk TTS] API Address : 127.0.0.1:7851
[AllTalk TTS] Gradio Light: http://127.0.0.1:7852
[AllTalk TTS] Gradio Dark : http://127.0.0.1:7852?__theme=dark
[AllTalk TTS]
[AllTalk TTS] AllTalk WIKI: https://github.com/erew123/alltalk_tts/wiki
[AllTalk TTS] Errors Help : https://github.com/erew123/alltalk_tts/wiki/Error-Messages-List
[AllTalk TTS]
[AllTalk TTS] Please use Ctrl+C when exiting AllTalk otherwise a
[AllTalk TTS] subprocess may continue running in the background.
[AllTalk TTS]
[AllTalk TTS] AllTalk Server Ready
[AllTalk ENG]
[AllTalk ENG] Swapping TTS Engine. Please wait.
[AllTalk ENG]
[AllTalk ENG] Transcoding       : ffmpeg found
[AllTalk ENG] DeepSpeed version : 0.14.0+ce78a63
[AllTalk ENG] Python Version    : 3.11.0
[AllTalk ENG] PyTorch Version   : 2.2.1
[AllTalk ENG] CUDA Version      : 12.1
[AllTalk ENG]
[AllTalk ENG] Warning: Model folder 'Test' is missing required
[AllTalk ENG] Warning: files or the folder does not exist.
[AllTalk ENG] Error: No models for this TTS engine were found to load. Please download a model.
[AllTalk ENG]
[AllTalk ENG] AllTalk Server Ready

When I try to generate Audio i get this error:

[AllTalk GEN] Test
[AllTalk ENG] Error: You currently have no TTS model loaded.
AllTalk [GEN] Error during audio generation: 400: You currently have no TTS model loaded.

I have downloaded the tts_engines.json and replaced the one in \alltalk_tts\system\tts_engines with the downloaded one. But after switching from piper to xtts the error appears again. The finetuned model in the folder is from the xtts-finetune-webui and consists config.json, model.pth, reference.wav, speakers_xtts.pth and vocab.json.

My diagnostics.log-file is this one:

diagnostics.log

When i delete the folder of the xtts-voice and replace the tts_engines.json again i can use the standard xtts V2.0.2 voice to generate Audio. It generates then 180 times the same Text to speech and then this error gets shown:

[AllTalk TTS] Error: An error occurred. Please see console output.
[AllTalk TTS] Error Details: 500 Server Error: Internal Server Error for url: http://127.0.0.1:7851/api/tts-generate

Is there a way to use the xtts-voices that where finetuned with the xtts-finetune-webui with the standalone AllTalk? Thanks and best regards! Jonas

erew123 commented 4 hours ago

Hi @theslipperyCarrot

An XTTS model consists of all the following files:

image

with the model.pth file being the main file and if you have finetuned an XTTS model, that is the actual file that is finetuned, however, you still need the other files in the folder. These files are checked for in the code https://github.com/erew123/alltalk_tts/blob/alltalkbeta/system/tts_engines/xtts/model_engine.py#L256

required_files = ["config.json", "model.pth", "mel_stats.pth", "speakers_xtts.pth", "vocab.json", "dvae.pth"]

Additionally, you will need the files from the same version of the model as your model.pth file. So if it was a 2.0.2 model, you need the 2.0.2 files. If it is 2.0.3, then the 2.0.3 files etc.

These other files are on this link https://huggingface.co/coqui/XTTS-v2/tree/main and you can select versions with the dropdown.

image

Your wav file should be placed in the \alltalk_tts\voices (Please read the TTS Engine Settings> XTTS TTS > Engine Help page)

Thanks