Closed ZDisket closed 4 years ago
@ZDisket seems i can't help in this case since i didn't use c++ for inference :D. Pls refer this comment (https://github.com/TensorSpeech/TensorFlowTTS/issues/53#issuecomment-661772767). @sujeendran can you help him ?
@dathudeptrai I'm using Tensorflow C API, he's using the Lite one. Can multiband melgan be converted to TFLite with just built in ops (therefore not requiring Flex delegate, which cannot be built for Windows)? It would be a bit of a mess, but I can use both if absolutely necessary.
@ZDisket MB-melgan is just convolution, so only tflite built-in is enough :D
@dathudeptrai I managed to pair Tensorflow C API FastSpeech2 and TFLite MB-melgan, but the audio comes out extremely fast, I think it's because I didn't give it a pass on the PQMF after synthesis as that happened in my notebook too when I removed that step, does the TFLite MB-melgan model have the PQMF?
@ZDisket yes, pqmf is inside mb_melgan class. see https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/models/mb_melgan.py#L159-L187. You just need convert to tflite again :D.
@dathudeptrai I see that the synthesis function already does one pass in in the PQMF https://github.com/TensorSpeech/TensorFlowTTS/blob/9804aefab9dd600b0e9b70c9fc83590338988b8e/tensorflow_tts/models/mb_melgan.py#L185-L187 Can I make it so that it does another, or is there a reason why in the notebooks the second pass always happens outside of the class?
:)))) the reason is because the notebook is outdate =))))))))))). I will update it later :v.
@dathudeptrai For some reason adding a second pass doesn't fix the problem, it seems that PQMF passes inside the MB-melgan class are ineffective, so I had to export it separately to TFLite.
Alright, I got inference with PQMF as a separate TFLite model working, I'll just close this issue
@dathudeptrai For some reason adding a second pass doesn't fix the problem, it seems that PQMF passes inside the MB-melgan class are ineffective, so I had to export it separately to TFLite.
that is my bug, sorry :D. The newest code fixed this problem :D
@dathudeptrai I'm almost finished with the implementation, my mb-melgan outputs noisy audio, do you think it's because the vocoder is TFLite? Sample 1 Sample 2 I can give you the executable if you want. Edit: I just found it why I couldn't use the regular Tensorflow API for Mb-MelGAN, it's because I used all the StatefulPartitionedCalls, I only have to use the first one. Sample
@ZDisket seem the last samples is ok, there is no noise, what is the differences ?
@dathudeptrai The first 2 are inferenced with TFLite API, the last one is with regular Tensorflow. I managed to fix the original problem and now I'm only using Tensorflow C API. See pull request: https://github.com/TensorSpeech/TensorFlowTTS/pull/191
@ZDisket any comment about inference speed/ latency ?
@dathudeptrai According to my rough estimations (using my phone timer), this sentence: Hello, I have cards and a Ford Mustang
takes about 08.79s from input to WAV saving the first time, then the rest slightly less than 2 seconds, in a dual-core A6-7400k CPU. Most people have way better CPUs, so it's probably going to be way faster.
I converted the VCTK multiband melgan model which was made from converted weights from kan-bayashi's repo into a SavedModel as detailed in the multiband melgan inference notebook and it works fine in Python but when loading it into the C API (the same way I loaded a FastSpeech2 model and ran inference) and trying to run inference I get this:
When I feed it that value with anything, it first saves the weights then throws an error saying that it can't find the weights, or if they're already there, throws another error saying that it can't find a certain variable. For reference, these are the last operations in the loaded model: (StatefulPartitionedCall_xxs are the outputs)
The FastSpeech2 model did not have this layer and it ran well, so I doubt it's an issue with my implementation.