Closed dathudeptrai closed 3 years ago
@dathudeptrai How does C++ reasoning compare to python's speed?
@rgzn-aiyun I never tested C++ inference outside of my local machine (and a Windows Server 2012 VPS) and neither did I test Python from outside Colab. But my semi-informed estimation is that it'll be roughly equal. The resource demanding parts are in the model inference which already kinda runs on C++ under the hood, just nicely wrapped in Python code.
@rgzn-aiyun I never tested C++ inference outside of my local machine (and a Windows Server 2012 VPS) and neither did I test Python from outside Colab. But my semi-informed estimation is that it'll be roughly equal. The resource demanding parts are in the model inference which already kinda runs on C++ under the hood, just nicely wrapped in Python code.
C++ has higher requirements for the system environment. If the speed is similar, then python is used for reasoning.
@rgzn-aiyun
C++ has higher requirements for the system environment. If the speed is similar, then python is used for reasoning.
I don't know where you got that from? Python inference has all the overhead from the interpreter + all the dependencies (NumPy, Tensorflow, etc...) required to run, 2GB of disk space min. Native C++ inference, at least on Windows, only requires a single 100MB DLL; on Ubuntu, that's 200MB of shared libraries. It's way more convenient and lightweight. And this is for inference with full Tensorflow, not TFLite; I originally planned for the vocoder inference to be TFLite until I heard how the audio sounded more noisy than a fucked VHS.
Memory consumption is pretty equal. I did local Python ESPNet-TTS & PWGAN inference a few months ago and it was 500MB per loaded model (text2mel + vocoder), the same as C++ inference of this.
Any instructions on how to build the c++ inference on mac osx? Thanks
@ZDisket @candlewill
@aitalk I'm afraid you're on your own. Since Mac OS is Unix-based, you can try the same instructions for Linux; but I'm not sure. On one though, I do use qmake, which should be fully cross-platform.
@ZDisket Thanks for your work. I would like to compile the cppwin in MSVC 2019 (v142). I suppose I need to recompile the dependencies
I compiled the OpenFST from this repo https://github.com/kkm000/openfst. Then I would like to compile libPhonetisarus from your repo https://github.com/ZDisket/Phonetisaurus
I opened the QT LibPhonetisaurus.pro and tried to compile it. I got this error D:\projects\tts\Phonetisaurus\src\include\LatticePruner.h:33: error: C1083: Cannot open include file: 'fst/fstlib.h': No such file or directory
And in .pro line 55 LIBS += -L/usr/local/lib/fst/ -lfst -lfstfar -lfstngram I guess this should direct to the openfast compiled path, however, I don't see fstfar and fstngram there. There only exist libfst.lib and libfstscript.lib
Could you provide more precise instruction to compile LibPhonetisaurus in Windows? Thanks.
@ronggong Looks like I forgot to add a config for compiling libPhonetisaurus for Windows. In the .pro file, change that line to these two:
win32: LIBS += -L$$PWD/libwin/ libfst.lib libfstscript.lib
unix:!macx: LIBS += -L/usr/local/lib/fst/ -lfst -lfstfar -lfstngram
Put your two .lib files in the subfolder libwin
where the .pro file is located.
@ZDisket Thanks successfully rebuilt libPhonetisaurus with MVSC 2019 v142. I had to do three more things to build it:
(1) add openfst headers with INCLUDEPATH
(2) add #define M_LN2 0.693147180559945309417
in 3rdparty\rnnlm\rnnlmlib.cpp
(3) comment line 128-144 in src\3rdparty\lib\util.cc because CLOCK_REALTIME is undeclared.
@ZDisket The g2p model is unable to parse decimal numbers, such as 0.3. It returns SIL. I see there is a g2p.fst in the model folder, do you know any information about this model? Where did you get it? Maybe we can train a better model?
@ronggong It's from here: https://github.com/AdolfVonKleist/phonetisaurus-downloads/tree/master/models An old model, it can definitely be retrained (here: https://github.com/AdolfVonKleist/Phonetisaurus), although turning numbers into text isn't the G2P model's job. You're gonna want to import or roll your own advanced number to text library so you can then feed the text into the g2p.
@ZDisket I try to build your code in a new VS project, however, I got many errors like these: E0266 "Path" is ambiguous client_app_App D:\projects\tts\depsTFTTS\include\include\PhonetisaurusRex.h 153 E0266 "BYTE" is ambiguous client_app_App D:\Windows Kits\10\Include\10.0.18362.0\um\OleAuto.h 618
Have you had these errors? How did you resolve them?
@ronggong I have had the BYTE
is ambiguous error, it comes from FST having its own BYTE
type (which is used by the winAPI to define its byte) and Phonetisaurus using namespace fst;
which causes name collision. (I keep saying I want to refactor all of Phonetisaurus code, but other things always get in my way). When I ran into this error (in TensorVox), I renamed the BYTE
type to FSBYTE
in the included headers. I assume you'll have to do something similar for Path
@ZDisket Thanks, it compiled! I renamed the BYTE in fst include headers and Path in Phonetisaurus headers. As you mentioned in the compiling instruction, the /FORCE needs to be added to the Linker command line, this is not very elegant. Do you think there is a way to resolve it?
@ronggong I actually fixed that properly for the Linux implementation where GCC wasn't able to use its equivalent of /FORCE. Move the LoadClusters function from the Phonetisaurus header into its own .cpp file, although I think my latest fork should have that done already: https://github.com/ZDisket/Phonetisaurus/blob/master/src/loadclusters.cpp So theoretically if you used that one then you shouldn't need /FORCE.
@ZDisket I cloned and compiled the Phonetisarus repo last week, so it should be the most recent version. If I remove /FORCE, there is the Linker error
Severity Code Description Project File Line Suppression State
Error LNK2005 "int __cdecl LoadClusters(class fst::SymbolTable const *,class std::unordered_map<int,class std::vector<int,class std::allocator<int> >,struct std::hash<int>,struct std::equal_to<int>,class std::allocator<struct std::pair<int const ,class std::vector<int,class std::allocator<int> > > > > *,class std::unordered_map<class std::vector<int,class std::allocator<int> >,int,struct VectorIntHash,struct std::equal_to<class std::vector<int,class std::allocator<int> > >,class std::allocator<struct std::pair<class std::vector<int,class std::allocator<int> > const ,int> > > *)" (?LoadClusters@@YAHPEBVSymbolTable@fst@@PEAV?$unordered_map@HV?$vector@HV?$allocator@H@std@@@std@@U?$hash@H@2@U?$equal_to@H@2@V?$allocator@U?$pair@$$CBHV?$vector@HV?$allocator@H@std@@@std@@@std@@@2@@std@@PEAV?$unordered_map@V?$vector@HV?$allocator@H@std@@@std@@HUVectorIntHash@@U?$equal_to@V?$vector@HV?$allocator@H@std@@@std@@@2@V?$allocator@U?$pair@$$CBV?$vector@HV?$allocator@H@std@@@std@@H@std@@@2@@4@@Z) already defined in EnglishPhoneticProcessor.obj client_app_App D:\projects\client_app\client_app_projucer\Builds\VisualStudio2019\Voice.obj 1
And the g2p model has some problems, please try to synthesize "This should be like this". The phoneme sequence returned is: DH IH0 S SH UH1 D B L AY1 K DH IH0 S be -> B, only consonant is parsed.
@ronggong That's weird, it compiled well in GCC where its equivalent of /FORCE didn't work. I'll take a deeper look. As to the G2P result, yeah the model isn't very good.
@ZDisket I checked the python g2p https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/processor/ljspeech.py, It does text cleaning and tokenizes each character. So there is no to phoneme conversion. If your Fastspeech model receives the same input as python does, then we can remove the dependency of the Phonetisarus by port the python ljspeech.py to c++
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
C++ inference now supported (thanks @ZDisket for his dedicated support). It will be improve and support more models to adapt with main repo over the time :D. Let check it out :D
Code: https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/cppwin