gitmylo / audio-webui

A webui for different audio related Neural Networks
MIT License
973 stars 90 forks source link

[FEATURE REQUEST] separate bark setting to potentially improve long form generation #169

Open yesbroc opened 8 months ago

yesbroc commented 8 months ago

Basically, because quality degrades the longer the input is, making continuations from worsening quality outputs will severely impact the overall output. I've found this to be true while using Strict long and short.

I believe stitching together multiple, independent outputs would keep the output quality consistent rather than continuing from degrading outputs.

gitmylo commented 8 months ago

Yeah, i was actually kind of planning to add something for that, a setting that lets you choose between using the same history over and over, and looping the history back.

Since putting the same clip over and over causes inconsistencies with emotions etc. While looping it back has issues with it gaining noise, but having a more realistic, but also slowly degrading output.

I'm wondering if there's a way I could get the best of both worlds, like some denoiser to improve the audio quality throughout loopbacks, which would allow it to be consistent, but not lose the quality.

yesbroc commented 8 months ago

what does 'history' mean in this context?

gitmylo commented 8 months ago

Bark uses a system called "history prompts" which are basically context for the language model, It allows it to retain the same voice. These history prompts are stored in .npz files, containing 3 .npy files. The coarse, fine and semantic prompts.

yesbroc commented 8 months ago

ahh ok, what if for the whole prompt, they use the first sentence as history, then the rest of the paragraph uses that first sentence's history.

or the webui can detect whether a "[]" pops up and treats that as it's own history, which it uses for that one sentence.

gitmylo commented 8 months ago

Custom formatting for controlling the prompts could be useful. I'll think about it.

yesbroc commented 8 months ago

is there also a way to use multiple voices in one prompt?

gitmylo commented 8 months ago

Not currently but custom formatting could make it possible