This is an ncnn implementation of the VITS library that supports cross-platform GPU-accelerated speech synthesis.
The project is forked from weirdseed/Vits-Android-ncnn, Thanks to the original author for their contribution.
🔍 Prepare dependencies
Get the ncnn static library suitable for your runtime environment from its wiki or ncnn/releases.
Place it in the root directory ncnn folder like this:
ncnn
├─bin
├─include
└─lib
🛠️ Compile the project
a. libvits-ncnn
Execute in the repo root directory:
mkdir build && cd build
cmake .. -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang
make
After compilation, you can find libvits-ncnn.so
in the build directory.
b. vits-cli
Enter the demo
directory and execute:
mkdir build && cd build
cmake .. -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang
make
After compilation, you can find vits-cli
in the build directory.
🚀 Run the demo
In the directory where vits-cli
is located, prepare the dependencies required for running:
a. Download the openjtalk dictionary file and unzip it;
b. Download the VITS ncnn model, unzip the Atri part to the atri/
directory (for testing the monophonic model), and extract 365_epochs
to the 365_epochs/
directory (for testing the multi-tone model).
c. Download the VITS ncnn params (single
/multi
directories).
At this time, the directory has:
build
├─vits-cli
├─365_epochs
├─atri
├─multi
├─open_jtalk_dic_utf_8-1.11
└─single
Now execute ./vits-cli
.