Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
2.28k stars 905 forks source link

where to import my voice audio? #408

Open pnwseeker opened 5 years ago

pnwseeker commented 5 years ago

Reading over the steps to pre process and start training the encoder etc. I can't figure out where is the part where I insert my recorded voice or my utterances or embedding as they are called? All of these commands call for data set root but where is the source of my voice inserted at so it knows how to train a new model based on My Voice? It's all so terribly confusing.

I recorded my voice , its in a wave file. What do I do with my wave file of my voice?

Ananas120 commented 5 years ago

You can place all your files where you wants. All you need to do is to make a build(..) function like the one in datasets/preprocessor.py and you replace basename, wav_file and text by your values Good luck

pnwseeker commented 5 years ago

You can place all your files where you wants. All you need to do is to make a build(..) function like the one in datasets/preprocessor.py and you replace basename, wav_file and text by your values Good luck

Thanks for your reply . So I just train all the data sets first using the commands and then i switch the basename wave file of my voice to create a model that sounds like me. What do you mean by text of my values? I thought all i needed was just a few seconds of my my recorded voice to create a clone of my voice. Also where do i insert text to make my cloned voice speak out what I typed?

Ananas120 commented 5 years ago

If you wants to train the model you must have audio (audio file) and the text said in the audio In my dataset for example i have 2 directorys

pnwseeker commented 5 years ago

If you wants to train the model you must have audio (audio file) and the text said in the audio In my dataset for example i have 2 directorys

  • wavs -- file1.wav -- file2.wav
  • ...
  • text — file1.txt — file2.txt —... And in the function build_from_path(..) Basename : wavs/ Wav_file : file1.wav Text : text into file1.txt

Thanks for your help however I have abandoned this project. It seems I totally misunderstood it's capabilities from reading all the hype in articles which led me to google this repo , it was my understanding that you could take only a few seconds of audio from a speaker and then tacotron2 would create a method for you to synthesis that speakers voice by using txt inputs. It appears this is not the case but instead you have to have a huge data set with hundreds or thousands of utterances accompanied with aligned text to synthesis a voice. I will check back in a few years to see if anyone has manged to develop this to where it is usable on an intermediate level

JasonWei512 commented 5 years ago

If you wants to train the model you must have audio (audio file) and the text said in the audio In my dataset for example i have 2 directorys

wavs -- file1.wav -- file2.wav ... text — file1.txt — file2.txt —... And in the function build_from_path(..) Basename : wavs/ Wav_file : file1.wav Text : text into file1.txt

Thanks for your help however I have abandoned this project. It seems I totally misunderstood it's capabilities from reading all the hype in articles which led me to google this repo , it was my understanding that you could take only a few seconds of audio from a speaker and then tacotron2 would create a method for you to synthesis that speakers voice by using txt inputs. It appears this is not the case but instead you have to have a huge data set with hundreds or thousands of utterances accompanied with aligned text to synthesis a voice. I will check back in a few years to see if anyone has manged to develop this to where it is usable on an intermediate level

https://github.com/CorentinJ/Real-Time-Voice-Cloning 👆This might be what you want (I haven't tried this though)