Open athenasaurav opened 7 months ago
@athenasaurav Can I ask if I want to use anyone's voice, how to obtain TextGrid alignment?
Hello @yiwei0730
You can use MFA to do this. Please read this blog
Thanks for your reply, that means using MFA datasets_align.sh Run according to this (just use the same method as before with FS2)
Yes, you need MFA, but you don't need alignment for full dataset, you can just run on files from your samples.
I'm interested in his audio-prompt-free automatic sound generation I would like to ask where he produces/samples unique sound features when I don't give him the required sound prompts. Can you point it out to me? I want to know if after production, if i think the sound is great, i can repeatedly extract this feature parameter for use and synthesize this sound to another sentence.
Yes you can do this, just zero out required prompts (do not provide any) and you will get random voices which you can later use as prompt.
Yes you can do this, just zero out required prompts (do not provide any) and you will get random voices which you can later use as prompt. Steve Korshakov Sent via Superhuman @.> On Mon, Apr 1 2024 at 6:19 PM, yiwei0730 @*.**@*.>> wrote: I'm interested in his audio-prompt-free automatic sound generation I would like to ask where he produces/samples unique sound features when I don't give him the required sound prompts. Can you point it out to me? I want to know if after production, if i think the sound is great, i can repeatedly extract this feature parameter for use and synthesize this sound to another sentence. — Reply to this email directly, view it on GitHub<#6 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AADB2E2PBWROIPNO4SLYPDLY3IBRLAVCNFSM6AAAAABFJBOKPSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZQHEYDKMJVGQ. You are receiving this because you commented.Message ID: @.>
Good idea, I never thought that the new voice could be directly used as a prompt hahaha. I noticed that voices has four files to setting: TextGrid, pt, txt, wav TextGrid generates txt through MFA. It can be generated from recognition. Where should the pt file be generated?
MFA is a proxy between text-phoneme pairs, since gpt takes text and generates phonemes and durations you will get all you need and pack it to the similar pt file.
@ex3ndr Did you mean the created file is this ? https://github.com/ex3ndr/supervoice/blob/master/generate_voices.py
Yes, you need MFA, but you don't need alignment for full dataset, you can just run on files from your samples.
Yes
Hello Everyone,
Here is the Colab for synthesis.