TensorVox is an application designed to enable user-friendly and lightweight neural speech synthesis in the desktop, aimed at increasing accessibility to such technology.
Powered mainly by TensorFlowTTS and also by Coqui-TTS and VITS, it is written in pure C++/Qt, using the Tensorflow C API for interacting with Tensorflow models (first two), and LibTorch for PyTorch ones. This way, we can perform inference without having to install gigabytes worth of Python libraries, just a few DLLs.
Grab a copy from the releases, extract the .zip and check the Google Drive folder for models and installation instructions
If you're interested in using your own model, first you need to train then export it.
TensorVox supports models from three repos:
Those two examples should provide you with enough guidance to understand what is needed. If you're looking to train a model specifically for this purpose then I recommend TensorFlowTTS, as it is the one with the best support, and VITS, as it's the closest thing to perfect As for languages, out-of-the-box support is provided for English (Coqui and TFTTS, VITS), German and Spanish (only TensorFlowTTS); that is, you won't have to do anything. You can add languages without modifying code, as long as the phoneme set are IPA (stressed or nonstressed), ARPA, or GlobalPhone, (open an issue and I'll explain it to you)
Currently, only Windows 10 x64 (although I've heard reports of it running on 8.1) is supported.
Requirements:
Primed build (with all provided libraries):
deps
folder is in the same place as the .pro and main source files.Note that to try your shiny new executable you'll need to download a release of program as described above and replace the executable in that release with your new one, so you have all the DLLs in place.
TODO: Add instructions for compile from scratch.
Tensorflow C API: https://www.tensorflow.org/install/lang_c
CppFlow (TF C API -> C++ wrapper): https://github.com/serizba/cppflow
AudioFile (for WAV export): https://github.com/adamstark/AudioFile
Frameless Dark Style Window: https://github.com/Jorgen-VikingGod/Qt-Frameless-Window-DarkStyle
JSON for modern C++: https://github.com/nlohmann/json
r8brain-free-src (Resampling): https://github.com/avaneev/r8brain-free-src
rnnoise (CMake version, denoising output): https://github.com/almogh52/rnnoise-cmake
Logitech LED Illumination SDK (Mouse RGB integration): https://www.logitechg.com/en-us/innovation/developer-lab.html
QCustomPlot : https://www.qcustomplot.com/index.php/introduction
libnumbertext : https://github.com/Numbertext/libnumbertext
You can open an issue here or join the Discord server and discuss/ask anything there
For media/licensing/any other formal stuff inquiries, send to this email: 9yba9c1y@anonaddy.me
This program itself is MIT licensed, but for the models you use, their license terms apply. For example, if you're in Vietnam and using TensorFlowTTS models, you'll have to check here for some details