TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.84k stars 815 forks source link

Cannot run mb_melgan inference with C API: You must feed a value for placeholder tensor 'saver_filename' with dtype string #179

Closed ZDisket closed 4 years ago

ZDisket commented 4 years ago

I converted the VCTK multiband melgan model which was made from converted weights from kan-bayashi's repo into a SavedModel as detailed in the multiband melgan inference notebook and it works fine in Python but when loading it into the C API (the same way I loaded a FastSpeech2 model and ran inference) and trying to run inference I get this:

You must feed a value for placeholder tensor 'saver_filename' with dtype string
         [[{{node saver_filename}}]]

When I feed it that value with anything, it first saves the weights then throws an error saying that it can't find the weights, or if they're already there, throws another error saying that it can't find a certain variable. For reference, these are the last operations in the loaded model: (StatefulPartitionedCall_xxs are the outputs)

NoOp
Const
serving_default_mels
StatefulPartitionedCall
saver_filename
StatefulPartitionedCall_1
StatefulPartitionedCall_2

The FastSpeech2 model did not have this layer and it ran well, so I doubt it's an issue with my implementation.

dathudeptrai commented 4 years ago

@ZDisket seems i can't help in this case since i didn't use c++ for inference :D. Pls refer this comment (https://github.com/TensorSpeech/TensorFlowTTS/issues/53#issuecomment-661772767). @sujeendran can you help him ?

ZDisket commented 4 years ago

@dathudeptrai I'm using Tensorflow C API, he's using the Lite one. Can multiband melgan be converted to TFLite with just built in ops (therefore not requiring Flex delegate, which cannot be built for Windows)? It would be a bit of a mess, but I can use both if absolutely necessary.

dathudeptrai commented 4 years ago

@ZDisket MB-melgan is just convolution, so only tflite built-in is enough :D

ZDisket commented 4 years ago

@dathudeptrai I managed to pair Tensorflow C API FastSpeech2 and TFLite MB-melgan, but the audio comes out extremely fast, I think it's because I didn't give it a pass on the PQMF after synthesis as that happened in my notebook too when I removed that step, does the TFLite MB-melgan model have the PQMF?

dathudeptrai commented 4 years ago

@ZDisket yes, pqmf is inside mb_melgan class. see https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/models/mb_melgan.py#L159-L187. You just need convert to tflite again :D.

ZDisket commented 4 years ago

@dathudeptrai I see that the synthesis function already does one pass in in the PQMF https://github.com/TensorSpeech/TensorFlowTTS/blob/9804aefab9dd600b0e9b70c9fc83590338988b8e/tensorflow_tts/models/mb_melgan.py#L185-L187 Can I make it so that it does another, or is there a reason why in the notebooks the second pass always happens outside of the class?

dathudeptrai commented 4 years ago

:)))) the reason is because the notebook is outdate =))))))))))). I will update it later :v.

ZDisket commented 4 years ago

@dathudeptrai For some reason adding a second pass doesn't fix the problem, it seems that PQMF passes inside the MB-melgan class are ineffective, so I had to export it separately to TFLite.

ZDisket commented 4 years ago

Alright, I got inference with PQMF as a separate TFLite model working, I'll just close this issue

dathudeptrai commented 4 years ago

@dathudeptrai For some reason adding a second pass doesn't fix the problem, it seems that PQMF passes inside the MB-melgan class are ineffective, so I had to export it separately to TFLite.

that is my bug, sorry :D. The newest code fixed this problem :D

ZDisket commented 4 years ago

@dathudeptrai I'm almost finished with the implementation, my mb-melgan outputs noisy audio, do you think it's because the vocoder is TFLite? Sample 1 Sample 2 I can give you the executable if you want. Edit: I just found it why I couldn't use the regular Tensorflow API for Mb-MelGAN, it's because I used all the StatefulPartitionedCalls, I only have to use the first one. Sample

dathudeptrai commented 4 years ago

@ZDisket seem the last samples is ok, there is no noise, what is the differences ?

ZDisket commented 4 years ago

@dathudeptrai The first 2 are inferenced with TFLite API, the last one is with regular Tensorflow. I managed to fix the original problem and now I'm only using Tensorflow C API. See pull request: https://github.com/TensorSpeech/TensorFlowTTS/pull/191

dathudeptrai commented 4 years ago

@ZDisket any comment about inference speed/ latency ?

ZDisket commented 4 years ago

@dathudeptrai According to my rough estimations (using my phone timer), this sentence: Hello, I have cards and a Ford Mustang takes about 08.79s from input to WAV saving the first time, then the rest slightly less than 2 seconds, in a dual-core A6-7400k CPU. Most people have way better CPUs, so it's probably going to be way faster.