Closed platform-kit closed 1 year ago
inference steps? i don't know what you're talking about? this is not an iterative model, it just takes an input, and spits out an output. in one step.
quality is mostly held back by bark's inconsistency, bark can switch speakers on its own, bark can slowly fade to another speaker as well. this can be unpredictable.
because of that, quality greatly depends on how the audio ends, the quality of the audio, the words spoken, and more.
Oh, I see. I just assumed it was iterative. Thanks for the info.
If I want to train it for TTS purposes to have more consistent quality, with cloned voices of arbitrary sample quality -- is that possible with this model?
Like, would training the core model improve the likelihood of high quality output despite low quality voice cloning samples -- something akin to how 11Labs works?
no, this model cannot be trained on a specific voice, as what it does doesn't really interact much with the voice.
more info can be found here: https://github.com/gitmylo/audio-webui/wiki/how-bark-works
Interesting. Are you aware of this HG space?
https://huggingface.co/spaces/kevinwang676/Bark-with-Voice-Cloning
It seems to be be based on this repo, and succeeds at voice cloning.
I was not aware of it until now, but yes, the code used in that repo is exactly the same as my example code
In fact, these files are still the originals, including the original credit.
The official ways to run are:
Hey, @gitmylo, great work on this repo.
If I want to increase the quality what's the best way to go about that?
I imagine the number of steps that are used during both training and inference must be stored in some variable somewhere. Can you point me to it?
Or maybe there's another obvious solution?