Integrating LPCNET with Fastspeech

MlWoo / LPCNet

Efficient neural speech synthesis

BSD 3-Clause "New" or "Revised" License

80 stars 18 forks source link

Integrating LPCNET with Fastspeech #9

Open alokprasad opened 4 years ago

alokprasad commented 4 years ago

@MlWoo have you tried Fastspeech for Mel generation , its astonishing fast for generating Mel spectrograms, combined with LPCNET vocoder it could be work as realtime voice synthesis.

alokprasad commented 4 years ago

@MlWoo i did some changes in fastspeech for integrating with lpcnet here are my changes

1.First prepossessed audio (ljspeech) and converted it to pcm(s16)

mkdir -p dataset/LJSpeech-1.1/pcms
for i in dataset/LJSpeech-1.1/wavs/*.wav
#sample rate 16khz for lpcnet or 22050?
do sox $i -r 16000 -c 1 -t sw - > dataset/LJSpeech-1.1/pcms/$(basename "$i" | cut -d. -f1).s16
done

Then use below diff for fastspeech to train the network using 20 mels

https://github.com/alokprasad/binaries/blob/master/fast_speech_lpcnet.diff

MlWoo commented 4 years ago

@alokprasad thank you for watching the status of my repo. I am sorry that I would not have time to put effort on TTS. I think Tacotron2 is good enough and fast enough both on GPU(13x real time on 1080ti) and CPU. And it could acheieve larger throughput than fastspeech.

alokprasad commented 4 years ago

@MlWoo I did some inference time for Fastspeech its actually faster than Tacotron2 on CPU. eg. for 12 sec audio mel generation is taking about 1.2sec on Single Core CPU.

MlWoo commented 4 years ago

@alokprasad great job! hope you can share the work with us. It really fast.

alokprasad commented 4 years ago

@MlWoo I have right now integrated fastspeech and squeewave https://github.com/alokprasad/fastspeech_squeezewave

MlWoo commented 4 years ago

@alokprasad thank you. I will read it later.

alokprasad commented 4 years ago

@MlWoo i know you are not working on this , but just wanted too see if you faced any issue similar to below while integrating tacotron2 and lpcnet

, name: GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1)
Traceback (most recent call last):
  File "test_lpcnet.py", line 83, in <module>
    cfeat = enc.predict([features[c:c+1, :, :nb_used_features], periods[c:c+1, :, :]])
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1441, in predict
    x, _, _ = self._standardize_user_data(x)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 579, in _standardize_user_data
    exception_prefix='input')
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py", line 145, in standardize_input_data
    str(data_shape))
ValueError: Error when checking input: expected input_3 to have shape (None, 38) but got array with shape (992, 20)