Can you provide a zero-shot clone example?

ex3ndr / supervoice-voicebox

VoiceBox neural network implementation

95 stars 9 forks source link

Can you provide a zero-shot clone example? #12

Open yangdongchao opened 3 months ago

yangdongchao commented 3 months ago

Hi, your project is very interesting. But I find it is hard to use new prompt as reference audio to clone the target voice. I am trying to prepare with MFA tools, but I am wrong: ''' .cache/torch/hub/ex3ndr_supervoice-gpt_master/supervoice_gpt/tokenizer.py", line 57, in return [self.phoneme_to_id[p] for p in phonemes]


KeyError: '<UNK>'
'''

I cannot understand why it happens, I just use mfa align  to prepare the voice prompt.  So if you can provides a function that can directly use new prompt voice, it will be helpful!

ex3ndr commented 3 months ago

There is a simple script that created built-in voices: https://github.com/ex3ndr/supervoice/blob/master/generate_voices.py

You can try to use similar script to create a new voice

yangdongchao commented 3 months ago

There is a simple script that created built-in voices: https://github.com/ex3ndr/supervoice/blob/master/generate_voices.py

You can try to use similar script to create a new voice

Yes, I use this. But I find that I fail. I donot know whether the MAF version is not right.

yangdongchao commented 3 months ago

There is a simple script that created built-in voices: https://github.com/ex3ndr/supervoice/blob/master/generate_voices.py

You can try to use similar script to create a new voice

I make some voice based on your code.

wget https://huggingface.co/Dongchao/detect_dataset/resolve/main/new_voice.zip

But I fail to use them as prompt. Can you help to check?