Closed C00reNUT closed 1 year ago
for me there was exactly 0% speed change no matter how much i did the readme verbatim, or my own experimentation. also a lot of my voices that were fine in the old one didn't work on this. the ones that did though, were better/clearer so i just kinda shrugged that off and was like ok cool, alternate choice.
(rtx 3060 here)
using original repo tag in CLI
the original repo tag uses the kv_cache speedup features ...... try using the actual original tortoise repo, it will be much slower
Check your GPU VRAM is being utilized . This could be a pytorch issue.
Check your GPU VRAM is being utilized . This could be a pytorch issue.
Yeah it could be part of the problem. But I tried original repo and the average time for generation using the same sampling method and vocoder is about the same.
Anyway, I will try to use Pytorch 2.0 https://pytorch.org/blog/accelerated-diffusers-pt-20/ there seem to be interesting speedup possibilities out of the box.
Also https://github.com/nebuly-ai/nebullvm looks promising to me
Hello, thank you for trying to optimize the tortoise library, I am trying to compare the speed between the two implementations, but so far I am getting very similar results in both quality and speed. I use NVIDIA 3060 with 12GB VRAM.
Running the script bellow takes about 2m and 14s.
python scripts/tortoise_tts.py -p ultra_fast -O results/best_short_15/ultra_fast -v best_short_15 <text_short.txt --sampler dpm++2m --diffusion_iterations 30 --vocoder Univnet
Using the same setting, but using original repo tag in CLI takes about 2m and 2s.
python scripts/tortoise_tts.py -p ultra_fast -O results/best_short_15/ultra_fast_original -v best_short_15 <text_short.txt --original_tortoise
Am I missing something? Probably some tags I should add to speed up the generation?