Voice to semantic - Githubissues

gitmylo / bark-voice-cloning-HuBERT-quantizer

The code for the bark-voicecloning model. Training and inference.

MIT License

671 stars 111 forks source link

Voice to semantic #19

Open huydung179 opened 1 year ago

huydung179 commented 1 year ago

If I well understood, you used a custom semantic-voice dataset for training your HuBERT model. Can you tell me how to create this dataset? Especially how to get the semantic from a voice? Many thanks for this work.

gitmylo commented 1 year ago

The dataset creation code is up at https://github.com/gitmylo/bark-data-gen

To get the semantics from a voice, you have to use a trained HuBERT quantizer model. See a problem? It cannot be improved for a specific voice, because all you could train on, is previous outputs.

To understand why it works, you need to understand how bark works. https://github.com/gitmylo/audio-webui/wiki/how-bark-works The quantizer model just converts recognized speech patterns into a format which bark understands, and is able to complete. Essentially cloning a voice.

iamhch24 commented 1 year ago

Dear gitmylo, I also want to know how to create semantic data from wav source files. I gather Korean wav files and I need to make semantic data from them, also need to pre-train both semantic data and wav files. Could you explain about details. I really appreciate your great job.

gitmylo commented 1 year ago

If you want to train, you'll need a text dataset in the language you want to train for, you can modify the bark-data-gen code to load text files in another language for example. Then prepare the dataset, and train, as explained in https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer#how-do-i-train-it-myself. And just follow the other steps.