erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
1.05k stars 113 forks source link

long load time model #286

Closed kalle07 closed 3 months ago

kalle07 commented 3 months ago

Is your feature request related to a problem? Please describe. long time load model at beginning 11.29 sec

Describe the solution you'd like i have an NV-SSD with 2GB/sec read spead so that not the reason

Describe alternatives you've considered maybe it is calculated every time the hash ?

Additional context 16:13:19-650623 INFO Loading the extension "alltalktts" [AllTalk Startup] _ __ [AllTalk Startup] / \ | | | | | | | _ | | / | [AllTalk Startup] / \ | | | | |/ ` | | |/ / | | | | _ \ [AllTalk Startup] / | | | | | (| | | < | | | | ) | [AllTalk Startup] // __|| ||\,|||_\ || |_| |/ [AllTalk Startup] [AllTalk Startup] Config file check : No Updates required [AllTalk Startup] AllTalk startup Mode : Text-Gen-webui mode [AllTalk Startup] WAV file deletion : Disabled [AllTalk Startup] DeepSpeed version : 0.14.0+ce78a63 [AllTalk Startup] Model is available : Checking [AllTalk Startup] Model is available : Checked [AllTalk Startup] Current Python Version : 3.11.8 [AllTalk Startup] Current PyTorch Version: 2.2.2+cu121 [AllTalk Startup] Current CUDA Version : 12.1 [AllTalk Startup] Current TTS Version : 0.22.0 [AllTalk Startup] Current TTS Version is : Up to date [AllTalk Startup] AllTalk Github updated : 1st July 2024 at 08:57 [AllTalk Startup] TTS Subprocess : Starting up [AllTalk Startup] [AllTalk Startup] AllTalk Settings & Documentation: http://127.0.0.1:7851 [AllTalk Startup] [AllTalk Model] XTTSv2 Local Loading xttsv2_2.0.2 into cuda [AllTalk Model] Coqui Public Model License [AllTalk Model] https://coqui.ai/cpml.txt [AllTalk Model] Model Loaded in 11.29 seconds. [AllTalk Model] Ready

erew123 commented 3 months ago

Hi @kalle07

That load time is about average/normal for a reasonable modern day machine.

This question has been asked before and I did look into the reason for a longer load time; the answer is yes that there is an amount of pre-calculation occurs when the model loads in, which is why it appears to take as long as it does. Its not the same as loading an LLM model.

Furthermore, this load process is controlled by Coqui's scripts/tts service and not directly AllTalk's.

Thanks

kalle07 commented 3 months ago

thx , i see so he can maybe tune it up ?

erew123 commented 3 months ago

Not sure to be honest, you would have to look through the xtts loader and all its imports:

https://github.com/coqui-ai/TTS/blob/dev/TTS/tts/models/xtts.py

and the config loader and all its imports:

https://github.com/coqui-ai/TTS/blob/dev/TTS/tts/configs/xtts_config.py

To get to exactly whats going on.

Obviously there is interactions with Huggingface transformers too, so not sure what specifically where the pre-calcuations come in e.g. it may be specifically in the calls made within huggingface transformers, which would require hugggingface to look at that, however, I would assume they are as optimised as they get.