MlWoo / LPCNet

Efficient neural speech synthesis
BSD 3-Clause "New" or "Revised" License
80 stars 18 forks source link

dataset and preprocessing for tacotron2 + lpcnet #3

Open alokprasad opened 5 years ago

alokprasad commented 5 years ago

1.Do we need to train LPCNET with LJSpeech dataset or 16k-LP7? 2.DO we need to train both LPCNET and tacatron2 with same dataset? 3.Do we need to Tacatron-2/preprocess.py or just use ./header_removal.sh-->pcm files ./feature_extract.sh-->f32 files train.py takes wavand npy files how will it takes pcm and f32 file.?

MlWoo commented 5 years ago
  1. Every datast has their own data distribution.
  2. same dataset is recommended.
  3. I think LPCNet with 16k-LP7 will recover waveform from the audio feature which predicted from tacotron2 no matter which dataset is trained. Howerver, it maybe do harm to the effect when your train the two models with different datasets. refer 0st and 2nd point.
  4. refer to issure #4 .

extrally. T2 maybe needs a large dataset. So, training them with large dataset such as LJspeech is good idea.

lyz04551 commented 5 years ago

Retrain Tacotron2+lpcnet and train lpcnet to use GTA mode?

MlWoo commented 5 years ago

@lyz04551 I do not suggest to use GTA mode if you have not read the LPCNet code deeply. You should figure out the start and end of the audio or other transformation introduced by feature extractation of LPCNet.

lyz04551 commented 5 years ago

@lyz04551 I do not suggest to use GTA mode if you have not read the LPCNet code deeply. You should figure out the start and end of the audio or other transformation introduced by feature extractation of LPCNet.

Thank you for your reply. Have you encountered any problems with this synthetic audio volume? I use the LPCTron code of alokprasad. The basic parameters are the same. It is the same as the training method recommended by you. The Chinese data set used is retrained. Do you have a similar volume problem? image

MlWoo commented 5 years ago

@lyz04551 You could normalize the audio before features extraction.

lyz04551 commented 5 years ago

@lyz04551 You could normalize the audio before features extraction.

What can be said in detail, is the normalization of the magnitude?

MlWoo commented 5 years ago

@lyz04551 yes, rescale the audio volume which you can refer to the tacotron2 preprocessing.

lyz04551 commented 5 years ago

@lyz04551 yes, rescale the audio volume which you can refer to the tacotron2 preprocessing.

test.zip This is the sound that I combined with Tacotron+lpcnet. I always feel that the sound quality is not particularly good. Can you hear some questions?

carlfm01 commented 5 years ago

Hello @lyz04551 did you fixed the issue? I'm facing the same.

Examples here from taco2 and from the lpc extracted, it also contains my hparams : audios.zip

Which params am I missing? Is the same datasets with the same extracted features, any idea?

From the real features looks good.

Captura de pantalla (918)

From taco2 looks like is applying a filter, but can't figure out where. Captura de pantalla (919)

Thanks @MlWoo and @alokprasad for the scripts, saved me a lot of time.