Open alokprasad opened 4 years ago
@MlWoo i did some changes in fastspeech for integrating with lpcnet here are my changes
1.First prepossessed audio (ljspeech) and converted it to pcm(s16)
mkdir -p dataset/LJSpeech-1.1/pcms
for i in dataset/LJSpeech-1.1/wavs/*.wav
#sample rate 16khz for lpcnet or 22050?
do sox $i -r 16000 -c 1 -t sw - > dataset/LJSpeech-1.1/pcms/$(basename "$i" | cut -d. -f1).s16
done
https://github.com/alokprasad/binaries/blob/master/fast_speech_lpcnet.diff
@alokprasad thank you for watching the status of my repo. I am sorry that I would not have time to put effort on TTS. I think Tacotron2 is good enough and fast enough both on GPU(13x real time on 1080ti) and CPU. And it could acheieve larger throughput than fastspeech.
@MlWoo I did some inference time for Fastspeech its actually faster than Tacotron2 on CPU. eg. for 12 sec audio mel generation is taking about 1.2sec on Single Core CPU.
@alokprasad great job! hope you can share the work with us. It really fast.
@MlWoo I have right now integrated fastspeech and squeewave https://github.com/alokprasad/fastspeech_squeezewave
@alokprasad thank you. I will read it later.
@MlWoo i know you are not working on this , but just wanted too see if you faced any issue similar to below while integrating tacotron2 and lpcnet
, name: GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1)
Traceback (most recent call last):
File "test_lpcnet.py", line 83, in <module>
cfeat = enc.predict([features[c:c+1, :, :nb_used_features], periods[c:c+1, :, :]])
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1441, in predict
x, _, _ = self._standardize_user_data(x)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 579, in _standardize_user_data
exception_prefix='input')
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_utils.py", line 145, in standardize_input_data
str(data_shape))
ValueError: Error when checking input: expected input_3 to have shape (None, 38) but got array with shape (992, 20)
@MlWoo have you tried Fastspeech for Mel generation , its astonishing fast for generating Mel spectrograms, combined with LPCNET vocoder it could be work as realtime voice synthesis.