ex3ndr / supervoice

VoiceBox neural network implementation
73 stars 6 forks source link

Colab for Synthesis #6

Open athenasaurav opened 3 months ago

athenasaurav commented 3 months ago

Hello Everyone,

Here is the Colab for synthesis.

yiwei0730 commented 3 months ago

@athenasaurav Can I ask if I want to use anyone's voice, how to obtain TextGrid alignment?

athenasaurav commented 3 months ago

Hello @yiwei0730

You can use MFA to do this. Please read this blog

yiwei0730 commented 3 months ago

Thanks for your reply, that means using MFA datasets_align.sh Run according to this (just use the same method as before with FS2)

ex3ndr commented 3 months ago

Yes, you need MFA, but you don't need alignment for full dataset, you can just run on files from your samples.

yiwei0730 commented 3 months ago

I'm interested in his audio-prompt-free automatic sound generation I would like to ask where he produces/samples unique sound features when I don't give him the required sound prompts. Can you point it out to me? I want to know if after production, if i think the sound is great, i can repeatedly extract this feature parameter for use and synthesize this sound to another sentence.

ex3ndr commented 3 months ago

Yes you can do this, just zero out required prompts (do not provide any) and you will get random voices which you can later use as prompt.

yiwei0730 commented 3 months ago

Yes you can do this, just zero out required prompts (do not provide any) and you will get random voices which you can later use as prompt. Steve Korshakov Sent via Superhuman @.> On Mon, Apr 1 2024 at 6:19 PM, yiwei0730 @*.**@*.>> wrote: I'm interested in his audio-prompt-free automatic sound generation I would like to ask where he produces/samples unique sound features when I don't give him the required sound prompts. Can you point it out to me? I want to know if after production, if i think the sound is great, i can repeatedly extract this feature parameter for use and synthesize this sound to another sentence. — Reply to this email directly, view it on GitHub<#6 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AADB2E2PBWROIPNO4SLYPDLY3IBRLAVCNFSM6AAAAABFJBOKPSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZQHEYDKMJVGQ. You are receiving this because you commented.Message ID: @.>

Good idea, I never thought that the new voice could be directly used as a prompt hahaha. I noticed that voices has four files to setting: TextGrid, pt, txt, wav TextGrid generates txt through MFA. It can be generated from recognition. Where should the pt file be generated?

ex3ndr commented 3 months ago

MFA is a proxy between text-phoneme pairs, since gpt takes text and generates phonemes and durations you will get all you need and pack it to the similar pt file.

yiwei0730 commented 3 months ago

@ex3ndr Did you mean the created file is this ? https://github.com/ex3ndr/supervoice/blob/master/generate_voices.py

ex3ndr commented 3 months ago

Yes, you need MFA, but you don't need alignment for full dataset, you can just run on files from your samples.

Yes