Open ghost opened 1 year ago
Are you using the .sh script or doing it programmatically? (only loading components once vs each time)
I've learned you can seriously impact the speed by tweaking the advanced options and tuning options. Tweaking the number of autoregressive samples, as well as the amount of text processed in each batch made a big difference, and then applying some of the fancy stuff on top to make it sound good, like cond_free.
Truth be told, I finetuned original Tortoise TTS autoregressive model for an asmr voice to have breathy responses. Loved the outputs! But to make the pipeline production-ready, I though of using tortoise-tts-fast. Installed python 3.8 in a new conda environment and installed torch versions as mentioned in some of the open/closed issues.
Haven't yet felt where is the improvement in the generation time? I am running it on A10G. Vram unnecessarily goes up to 21GB. outputs are not great with 'ultra_fast' preset. Need to have 'fast' preset or 'very_fast' at least. Am I doing something wrong. though everything works without any error, I am running it through streamlit app.py file. Would anyone like to comment?