Open HobisPL opened 1 year ago
Note that TorToiSe uses Unvinet, not Waveglow. I've received the following other suggestions on top of HiFi-GAN:
all good options
Why not give us the option to select between them?
@152334H got bored of waiting so learned python and did it myself out of boredom, now it sounds way less shit :D https://github.com/deviandice/tortoise-tts-BigVGAN Make of it what you will. Here's the model. https://disk.yandex.com/d/fOjzTs8HQiFVdg
Wouldn't mind some credit if at all possible, thanks.
See also mrq's adaptation for more inspiration https://git.ecker.tech/mrq/ai-voice-cloning/issues/52
See also mrq's adaptation
That's literally also my implementation haha
ok i added bigvgan but i am still not going to change much in this repo in the future
you should also tell that mrq fella to use submodules/packages more often
fwiw @deviandice, I copied none of the code from your impl, but either way I added some attribution to the README.
Is there any benchmark or estimate using Bigvgan against the previous vocoder?
You can look at the figures of the paper https://www.semanticscholar.org/paper/BigVGAN%3A-A-Universal-Neural-Vocoder-with-Training-Lee-Ping/04f5553934c458305a501d63323f1b841fd5d102
fwiw @deviandice, I copied none of the code from your impl, but either way I added some attribution to the README.
That means a lot, thank you. It's not every day someone cites my work like this.
Have you tried changing the vocoder from Waveglow to HiFi-GAN? HiFi-GAN is faster and requires less VRAM. Alternatively, you could try adding a different vocoder.