PaddlePaddle / Parakeet

PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Parallel WaveGAN)
Other
599 stars 83 forks source link

How to get speech from text #17

Closed shock-wave007 closed 4 years ago

shock-wave007 commented 4 years ago

can anyone tell me how to use this library to convert TEXT to Speech

-i have clone the library, pre trained data , LJSpeech datasets

Q0. can i use this library to convert from text to speech? Q1. if yes can u tell the step by step command!

iam new to python and AI ML DL. but have basic programing understanding

thanks in advance

iclementine commented 4 years ago

You can cd to the examples/ directory to try each examples. For text to speech task, you can try deepvoice3, transformerTTS and fastspeech. Just following the README.

shock-wave007 commented 4 years ago

i tried wavenet i face 2 problem

  1. i don't know i input text sentence to synthesis
  2. it was taking forever to create voice i have 8300H i5 , gtx 1050 4Gb , 16gb ram
iclementine commented 4 years ago

Wavenet is a vocoder, which turns spectrogram(more specifically mel spectrogram) into audio. So wavenet does not directly transform text into audio. You have to use another text to spectrogram model.

As wavenet is an autoregressive model, it generates sample points one by one in a sequential manner. But for an audio file with sample rate 22050, it has 22050 sample points per second. As this is a simple implementation of wavenet without special optimization, in our practice, it takes 4 to 5 hours to synthesize 10 seconds of audio. So it is as expected. If you want a fast vocoder, you can try clarinet and waveflow, which are comparable in quality but much faster.

shock-wave007 commented 4 years ago

Thank you very much for ur help and knowledge,

I can't find how to input/set TEXT to be converted to voice in clarinet or waveflow documentation

iclementine commented 4 years ago

Also, clarinet and waveflow are vocoders. We have implemented 3 tts models, deepvoice3 and transformerTTS and fastspeech, they have text to spectrogram part and vocoder part. (See README and synthesize.py in the correspoinding folder in examples. For example, the synthesize.py in deepvoice3 can take as parameter a text file, one sentence per line to synthesize.) But for simplicity and focusing on the text to spectrogram part, some of them now only use a simple non neural network based vocoder griffinlim.

We are now working on integrate our text to spectrogram model with neural vocoders. When it is done, we will release new examples.

iclementine commented 4 years ago

Have a text file, one sentence per line, and pass the text file as the input of the synthesize.py. You can run python synthesize.py --help to see detailed usages.

examples/deepvoice3 and examples/transformertts and fastspeech all have a synthesize.py included. If you have any problem using them, please let us know, thank you.

shock-wave007 commented 4 years ago

thanks for ur help i was able to test deepvoice 3. ai ml dl stuff is hard (for me and my pc)

its nice but hard i think i will leave this to EXPERT people like you. :)