Closed underlines closed 1 year ago
I'm under the impression that TTS-Generation-WebUI has a more specific purpose, that being generation. While Audio-WebUI is meant globally for anything audio related. And with extension support coming soon, anyone will be able to make contributions.
That repo only has vocos (never tried) and tortoise (has MRQ ai voice cloning UI). Maybe it's worth it to implement vocos tho: https://charactr-platform.github.io/vocos/
Yeah, vocos is indeed something i have thought about implementing. I will also make some contributions to the bark implementation in the 🐸 TTS library and you can expect to see those changes to also be made to the bark implementation in audio-webui.
Also expect some frankenstein Bark features which might be pretty cool too. They should speed up inference but with a twist. Mainly looking at the coarse model here though. The possibilities are endless.
I tried the PRs in the bark queue and got a speedup for at least the first step. Bark only seems to use 30% gpu so I wonder if there is a bottleneck.
I think this is what was meant by people mentioning bark V2: https://github.com/suno-ai/bark/compare/main...dev
Is it just bark with history prompt or something else done to the NPZ?
I think this is what was meant by people mentioning bark V2: suno-ai/bark@main...dev
Is it just bark with history prompt or something else done to the NPZ?
bark v2 released quite a while ago, like 2 months ago i believe, and i had implemented the new speakers and support for bark v2 shortly after. They use the same type of npz files. This was implemented before public release of audio-webui.
That make sense. They just left the dev branch.
You might or might not be aware of rsxdalv/tts-generation-webui which has very similar goals. Maybe it would make sense to combine forces with rsxdalv?