TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.82k stars 812 forks source link

🌝 C++ inference now available 💃 #216

Closed dathudeptrai closed 3 years ago

dathudeptrai commented 4 years ago

C++ inference now supported (thanks @ZDisket for his dedicated support). It will be improve and support more models to adapt with main repo over the time :D. Let check it out :D

Code: https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/cppwin

rgzn-aiyun commented 4 years ago

@dathudeptrai How does C++ reasoning compare to python's speed?

ZDisket commented 4 years ago

@rgzn-aiyun I never tested C++ inference outside of my local machine (and a Windows Server 2012 VPS) and neither did I test Python from outside Colab. But my semi-informed estimation is that it'll be roughly equal. The resource demanding parts are in the model inference which already kinda runs on C++ under the hood, just nicely wrapped in Python code.

rgzn-aiyun commented 4 years ago

@rgzn-aiyun I never tested C++ inference outside of my local machine (and a Windows Server 2012 VPS) and neither did I test Python from outside Colab. But my semi-informed estimation is that it'll be roughly equal. The resource demanding parts are in the model inference which already kinda runs on C++ under the hood, just nicely wrapped in Python code.

C++ has higher requirements for the system environment. If the speed is similar, then python is used for reasoning.

ZDisket commented 4 years ago

@rgzn-aiyun

C++ has higher requirements for the system environment. If the speed is similar, then python is used for reasoning.

I don't know where you got that from? Python inference has all the overhead from the interpreter + all the dependencies (NumPy, Tensorflow, etc...) required to run, 2GB of disk space min. Native C++ inference, at least on Windows, only requires a single 100MB DLL; on Ubuntu, that's 200MB of shared libraries. It's way more convenient and lightweight. And this is for inference with full Tensorflow, not TFLite; I originally planned for the vocoder inference to be TFLite until I heard how the audio sounded more noisy than a fucked VHS.

Memory consumption is pretty equal. I did local Python ESPNet-TTS & PWGAN inference a few months ago and it was 500MB per loaded model (text2mel + vocoder), the same as C++ inference of this.

aitalk commented 4 years ago

Any instructions on how to build the c++ inference on mac osx? Thanks

dathudeptrai commented 4 years ago

@ZDisket @candlewill

ZDisket commented 4 years ago

@aitalk I'm afraid you're on your own. Since Mac OS is Unix-based, you can try the same instructions for Linux; but I'm not sure. On one though, I do use qmake, which should be fully cross-platform.

ronggong commented 4 years ago

@ZDisket Thanks for your work. I would like to compile the cppwin in MSVC 2019 (v142). I suppose I need to recompile the dependencies

  1. libPhonetisaurus
  2. OpenFST

I compiled the OpenFST from this repo https://github.com/kkm000/openfst. Then I would like to compile libPhonetisarus from your repo https://github.com/ZDisket/Phonetisaurus

I opened the QT LibPhonetisaurus.pro and tried to compile it. I got this error D:\projects\tts\Phonetisaurus\src\include\LatticePruner.h:33: error: C1083: Cannot open include file: 'fst/fstlib.h': No such file or directory

And in .pro line 55 LIBS += -L/usr/local/lib/fst/ -lfst -lfstfar -lfstngram I guess this should direct to the openfast compiled path, however, I don't see fstfar and fstngram there. There only exist libfst.lib and libfstscript.lib

Could you provide more precise instruction to compile LibPhonetisaurus in Windows? Thanks.

ZDisket commented 4 years ago

@ronggong Looks like I forgot to add a config for compiling libPhonetisaurus for Windows. In the .pro file, change that line to these two:

win32: LIBS += -L$$PWD/libwin/ libfst.lib libfstscript.lib

unix:!macx: LIBS += -L/usr/local/lib/fst/ -lfst -lfstfar -lfstngram

Put your two .lib files in the subfolder libwinwhere the .pro file is located.

ronggong commented 4 years ago

@ZDisket Thanks successfully rebuilt libPhonetisaurus with MVSC 2019 v142. I had to do three more things to build it: (1) add openfst headers with INCLUDEPATH (2) add #define M_LN2 0.693147180559945309417 in 3rdparty\rnnlm\rnnlmlib.cpp (3) comment line 128-144 in src\3rdparty\lib\util.cc because CLOCK_REALTIME is undeclared.

ronggong commented 4 years ago

@ZDisket The g2p model is unable to parse decimal numbers, such as 0.3. It returns SIL. I see there is a g2p.fst in the model folder, do you know any information about this model? Where did you get it? Maybe we can train a better model?

ZDisket commented 4 years ago

@ronggong It's from here: https://github.com/AdolfVonKleist/phonetisaurus-downloads/tree/master/models An old model, it can definitely be retrained (here: https://github.com/AdolfVonKleist/Phonetisaurus), although turning numbers into text isn't the G2P model's job. You're gonna want to import or roll your own advanced number to text library so you can then feed the text into the g2p.

ronggong commented 4 years ago

@ZDisket I try to build your code in a new VS project, however, I got many errors like these: E0266 "Path" is ambiguous client_app_App D:\projects\tts\depsTFTTS\include\include\PhonetisaurusRex.h 153 E0266 "BYTE" is ambiguous client_app_App D:\Windows Kits\10\Include\10.0.18362.0\um\OleAuto.h 618

Have you had these errors? How did you resolve them?

ZDisket commented 4 years ago

@ronggong I have had the BYTEis ambiguous error, it comes from FST having its own BYTEtype (which is used by the winAPI to define its byte) and Phonetisaurus using namespace fst; which causes name collision. (I keep saying I want to refactor all of Phonetisaurus code, but other things always get in my way). When I ran into this error (in TensorVox), I renamed the BYTEtype to FSBYTEin the included headers. I assume you'll have to do something similar for Path

ronggong commented 4 years ago

@ZDisket Thanks, it compiled! I renamed the BYTE in fst include headers and Path in Phonetisaurus headers. As you mentioned in the compiling instruction, the /FORCE needs to be added to the Linker command line, this is not very elegant. Do you think there is a way to resolve it?

ZDisket commented 4 years ago

@ronggong I actually fixed that properly for the Linux implementation where GCC wasn't able to use its equivalent of /FORCE. Move the LoadClusters function from the Phonetisaurus header into its own .cpp file, although I think my latest fork should have that done already: https://github.com/ZDisket/Phonetisaurus/blob/master/src/loadclusters.cpp So theoretically if you used that one then you shouldn't need /FORCE.

ronggong commented 4 years ago

@ZDisket I cloned and compiled the Phonetisarus repo last week, so it should be the most recent version. If I remove /FORCE, there is the Linker error

Severity    Code    Description Project File    Line    Suppression State
Error   LNK2005 "int __cdecl LoadClusters(class fst::SymbolTable const *,class std::unordered_map<int,class std::vector<int,class std::allocator<int> >,struct std::hash<int>,struct std::equal_to<int>,class std::allocator<struct std::pair<int const ,class std::vector<int,class std::allocator<int> > > > > *,class std::unordered_map<class std::vector<int,class std::allocator<int> >,int,struct VectorIntHash,struct std::equal_to<class std::vector<int,class std::allocator<int> > >,class std::allocator<struct std::pair<class std::vector<int,class std::allocator<int> > const ,int> > > *)" (?LoadClusters@@YAHPEBVSymbolTable@fst@@PEAV?$unordered_map@HV?$vector@HV?$allocator@H@std@@@std@@U?$hash@H@2@U?$equal_to@H@2@V?$allocator@U?$pair@$$CBHV?$vector@HV?$allocator@H@std@@@std@@@std@@@2@@std@@PEAV?$unordered_map@V?$vector@HV?$allocator@H@std@@@std@@HUVectorIntHash@@U?$equal_to@V?$vector@HV?$allocator@H@std@@@std@@@2@V?$allocator@U?$pair@$$CBV?$vector@HV?$allocator@H@std@@@std@@H@std@@@2@@4@@Z) already defined in EnglishPhoneticProcessor.obj    client_app_App  D:\projects\client_app\client_app_projucer\Builds\VisualStudio2019\Voice.obj    1   

And the g2p model has some problems, please try to synthesize "This should be like this". The phoneme sequence returned is: DH IH0 S SH UH1 D B L AY1 K DH IH0 S be -> B, only consonant is parsed.

ZDisket commented 4 years ago

@ronggong That's weird, it compiled well in GCC where its equivalent of /FORCE didn't work. I'll take a deeper look. As to the G2P result, yeah the model isn't very good.

ronggong commented 4 years ago

@ZDisket I checked the python g2p https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/ljspeech.py, It does text cleaning and tokenizes each character. So there is no to phoneme conversion. If your Fastspeech model receives the same input as python does, then we can remove the dependency of the Phonetisarus by port the python ljspeech.py to c++

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.