152334H / tortoise-tts-fast

Fast TorToiSe inference (5x or your money back!)
GNU Affero General Public License v3.0
755 stars 176 forks source link

changing the vocoder #34

Open HobisPL opened 1 year ago

HobisPL commented 1 year ago

Have you tried changing the vocoder from Waveglow to HiFi-GAN? HiFi-GAN is faster and requires less VRAM. Alternatively, you could try adding a different vocoder.

152334H commented 1 year ago

Note that TorToiSe uses Unvinet, not Waveglow. I've received the following other suggestions on top of HiFi-GAN:

all good options

deviandice commented 1 year ago

@152334H got bored of waiting so learned python and did it myself out of boredom, now it sounds way less shit :D https://github.com/deviandice/tortoise-tts-BigVGAN Make of it what you will. Here's the model. https://disk.yandex.com/d/fOjzTs8HQiFVdg

Wouldn't mind some credit if at all possible, thanks.

Ryu1845 commented 1 year ago

See also mrq's adaptation for more inspiration https://git.ecker.tech/mrq/ai-voice-cloning/issues/52

deviandice commented 1 year ago

See also mrq's adaptation

That's literally also my implementation haha

152334H commented 1 year ago

ok i added bigvgan but i am still not going to change much in this repo in the future

you should also tell that mrq fella to use submodules/packages more often


fwiw @deviandice, I copied none of the code from your impl, but either way I added some attribution to the README.

frandmb commented 1 year ago

Is there any benchmark or estimate using Bigvgan against the previous vocoder?

Ryu1845 commented 1 year ago

You can look at the figures of the paper https://www.semanticscholar.org/paper/BigVGAN%3A-A-Universal-Neural-Vocoder-with-Training-Lee-Ping/04f5553934c458305a501d63323f1b841fd5d102

deviandice commented 1 year ago

fwiw @deviandice, I copied none of the code from your impl, but either way I added some attribution to the README.

That means a lot, thank you. It's not every day someone cites my work like this.