gitmylo / audio-webui

A webui for different audio related Neural Networks
MIT License
1.07k stars 100 forks source link

[FEATURE REQUEST] Combine development efforts with tts-generation-webui #59

Closed underlines closed 1 year ago

underlines commented 1 year ago

You might or might not be aware of rsxdalv/tts-generation-webui which has very similar goals. Maybe it would make sense to combine forces with rsxdalv?

gitmylo commented 1 year ago

I'm under the impression that TTS-Generation-WebUI has a more specific purpose, that being generation. While Audio-WebUI is meant globally for anything audio related. And with extension support coming soon, anyone will be able to make contributions.

Ph0rk0z commented 1 year ago

That repo only has vocos (never tried) and tortoise (has MRQ ai voice cloning UI). Maybe it's worth it to implement vocos tho: https://charactr-platform.github.io/vocos/

gitmylo commented 1 year ago

Yeah, vocos is indeed something i have thought about implementing. I will also make some contributions to the bark implementation in the 🐸 TTS library and you can expect to see those changes to also be made to the bark implementation in audio-webui.

Also expect some frankenstein Bark features which might be pretty cool too. They should speed up inference but with a twist. Mainly looking at the coarse model here though. The possibilities are endless.

Ph0rk0z commented 1 year ago

I tried the PRs in the bark queue and got a speedup for at least the first step. Bark only seems to use 30% gpu so I wonder if there is a bottleneck.

Ph0rk0z commented 1 year ago

I think this is what was meant by people mentioning bark V2: https://github.com/suno-ai/bark/compare/main...dev

Is it just bark with history prompt or something else done to the NPZ?

gitmylo commented 1 year ago

I think this is what was meant by people mentioning bark V2: suno-ai/bark@main...dev

Is it just bark with history prompt or something else done to the NPZ?

bark v2 released quite a while ago, like 2 months ago i believe, and i had implemented the new speakers and support for bark v2 shortly after. They use the same type of npz files. This was implemented before public release of audio-webui.

Ph0rk0z commented 1 year ago

That make sense. They just left the dev branch.