How to increase quality?

gitmylo / bark-voice-cloning-HuBERT-quantizer

The code for the bark-voicecloning model. Training and inference.

MIT License

671 stars 111 forks source link

How to increase quality? #17

Closed platform-kit closed 1 year ago

platform-kit commented 1 year ago

Hey, @gitmylo, great work on this repo.

If I want to increase the quality what's the best way to go about that?

I imagine the number of steps that are used during both training and inference must be stored in some variable somewhere. Can you point me to it?

Or maybe there's another obvious solution?

gitmylo commented 1 year ago

inference steps? i don't know what you're talking about? this is not an iterative model, it just takes an input, and spits out an output. in one step.

quality is mostly held back by bark's inconsistency, bark can switch speakers on its own, bark can slowly fade to another speaker as well. this can be unpredictable.

because of that, quality greatly depends on how the audio ends, the quality of the audio, the words spoken, and more.

platform-kit commented 1 year ago

Oh, I see. I just assumed it was iterative. Thanks for the info.

If I want to train it for TTS purposes to have more consistent quality, with cloned voices of arbitrary sample quality -- is that possible with this model?

Like, would training the core model improve the likelihood of high quality output despite low quality voice cloning samples -- something akin to how 11Labs works?

gitmylo commented 1 year ago

no, this model cannot be trained on a specific voice, as what it does doesn't really interact much with the voice.

more info can be found here: https://github.com/gitmylo/audio-webui/wiki/how-bark-works

platform-kit commented 1 year ago

Interesting. Are you aware of this HG space?

https://huggingface.co/spaces/kevinwang676/Bark-with-Voice-Cloning

It seems to be be based on this repo, and succeeds at voice cloning.

gitmylo commented 1 year ago

I was not aware of it until now, but yes, the code used in that repo is exactly the same as my example code

In fact, these files are still the originals, including the original credit.

The official ways to run are: