apinge / MeloTTS.cpp

A lightweight pure C++ Text-to-Speech (TTS) pipeline with OpenVINO, supporting mixed English and Chinese languages.
Apache License 2.0
16 stars 4 forks source link
ai openvino openvino-toolkit text-to-speech tts

MeloTTS.cpp

This repository offers a C++ implementation of meloTTS, which is a high-quality, multilingual Text-to-Speech (TTS) library released by MyShell.ai that supports English, Chinese (mixed with English), and various other languages. This implementation is fully integrated with OpenVINO. Currently, this repository only supports Chinese mixed with English. Support for English model is coming next.

Setup and Execution Guide

1. Download OpenVINO C++ Package

To download the OpenVINO C++ package for Windows, please refer to the following link: Install OpenVINO for Windows. For OpenVINO 2024.4 on Windows, you can run the command line in the command prompt (cmd).

curl -O https://storage.openvinotoolkit.org/repositories/openvino/packages/2024.4/windows/w_openvino_toolkit_windows_2024.4.0.16579.c3152d32c9c_x86_64.zip --ssl-no-revoke
tar -xvf w_openvino_toolkit_windows_2024.4.0.16579.c3152d32c9c_x86_64.zip

For Linux, you can download the C++ package from this link: Install OpenVINO for Linux. For OpenVINO 2024.4 on Linux, simply download it from https://storage.openvinotoolkit.org/repositories/openvino/packages/2024.4/linux and unzip the package.

For additional versions and more information about OpenVINO, visit the official OpenVINO Toolkit page: OpenVINO Toolkit Overview.

2. Clone the Repository

git install lfs
git clone https://github.com/apinge/MeloTTS.cpp.git

3. Build and Run

3.1 Windows Build and Run

<OpenVINO_DIR>\setupvars.bat
cd MeloTTS.cpp 
cmake -S . -B build && cmake --build build --config Release
.\build\Release\meloTTS_ov.exe --model_dir ov_models --input_file inputs.txt  --output_file audio.wav

3.2 Linux Build and Run

source <OpenVINO_DIR>/setupvars.sh
cd MeloTTS.cpp 
cmake -S . -B build && cmake --build build --config Release
./build/meloTTS_ov --model_dir ov_models --input_file inputs.txt --output_file audio.wav

4. Arguments Description

You can use run_tts.bat or run_tts.sh as sample scripts to run the models. Below are the meanings of all the arguments you can use with these scripts:

Supported Versions

If you specify GPU as the device, please refer to Configurations for Intel® Processor Graphics (GPU) with OpenVINO™ to install the GPU driver.

Future Development Plan

Here are some features and improvements planned for future releases:

  1. Add English language TTS support:

    • Enable English text-to-speech (TTS) functionality, but tokenization for English language input is not yet implemented.
  2. Support for NPU device:

    • Implement NPU device compatibility for the BERT model within the pipeline.

Python Version

The Python version of this repository (MeloTTS integrated with OpenVINO) is provided in MeloTTS-OV. The Python version includes methods to convert the model into OpenVINO IR.

Third-Party Code

This repository includes third-party code and libraries for Chinese word segmentation and pinyin processing.