JarodMica / audiobook_maker

GNU General Public License v3.0
307 stars 51 forks source link

Feature request: MaskGCT-TTS #75

Open GalenMarek14 opened 1 week ago

GalenMarek14 commented 1 week ago

GitHub: https://github.com/open-mmlab/Amphion/blob/main/models/tts/maskgct/README.md Demo Page: https://maskgct.github.io/

This is probably the current SOTA model, much better than F5-TTS. They just haven't posted any promotional posts on Reddit or anywhere else, so they aren't well-known yet. I'm not that good with technical stuff, but from what I get from tests:

Pros: -Uses the same architecture as F5, but is way better since it's a much bigger model (needs 12+ GB VRAM). -Outputs clearer and higher quality voices for every reference voice. -Supports longer reference voices (I tried up to 5 minutes and it worked fine and fast). -Supports multiple languages, including English, Chinese, Japanese, German, French, and Korean (although languages other than English and Chinese are undertrained, as it was trained on the Emilia dataset, it still sounds great). -Can simulate the emotion of the text better, as in, it doesn't just copy the emotion of the reference voice, but can simulate the emotion of the text more accurately and produce a more natural voice. -More robust and can handle tough tongue twisters without errors. -Can clone harder voices like whispers, which F5 couldn't do. CosyVoice could do this too, but it's slower and lower quality.

Cons: -Super hard to get working on Win 11 (required help from other users to make it work). -Still a bit wonky on Win 11, with lower quality outputs compared to the demo page. -Struggles with predicting duration for non-English languages. -Generally a bit worse at non-English languages on my local version. -Can't replicate the demo page examples, for example with the whisper voice it outputs something between a whisper and a low voice.

Despite all these cons, it's still better than F5 even on my Win 11. I couldn't figure out what's the problem, but maybe you can get it working better :)

JarodMica commented 1 week ago

I am super aware of this, some 👀 people keep posting YouTube comments about it. I'll be taking a look at it sometime next week as I get time

GalenMarek14 commented 1 week ago

Oh, I posted there too, sorry if I spammed a bit. I thought you didn't see them because you were so busy.

JarodMica commented 1 week ago

I see a lot of things, but usually will set a time out on a day to respond to comments so that's it. As long as it's not true spam, that's ok, this is not true spam lol