PABannier / bark.cpp

Suno AI's Bark model in C/C++ for fast text-to-speech
MIT License
630 stars 48 forks source link
inference machine-learning text-to-speech

bark.cpp

bark.cpp

Actions Status License: MIT

Roadmap / encodec.cpp / ggml

Inference of SunoAI's bark model in pure C/C++.

Description

With bark.cpp, our goal is to bring real-time realistic multilingual text-to-speech generation to the community.

Models supported

Models we want to implement! Please open a PR :)

Demo on Google Colab (#95)


Here is a typical run using bark.cpp:

./main -p "This is an audio generated by bark.cpp"

   __               __
   / /_  ____ ______/ /__        _________  ____
  / __ \/ __ `/ ___/ //_/       / ___/ __ \/ __ \
 / /_/ / /_/ / /  / ,<    _    / /__/ /_/ / /_/ /
/_.___/\__,_/_/  /_/|_|  (_)   \___/ .___/ .___/
                                  /_/   /_/

bark_tokenize_input: prompt: 'This is an audio generated by bark.cpp'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20795 20172 20199 33733 58966 20203 28169 20222

Generating semantic tokens: 17%

bark_print_statistics:   sample time =    10.98 ms / 138 tokens
bark_print_statistics:  predict time =   614.96 ms / 4.46 ms per token
bark_print_statistics:    total time =   633.54 ms

Generating coarse tokens: 100%

bark_print_statistics:   sample time =     3.75 ms / 410 tokens
bark_print_statistics:  predict time =  3263.17 ms / 7.96 ms per token
bark_print_statistics:    total time =  3274.00 ms

Generating fine tokens: 100%

bark_print_statistics:   sample time =    38.82 ms / 6144 tokens
bark_print_statistics:  predict time =  4729.86 ms / 0.77 ms per token
bark_print_statistics:    total time =  4772.92 ms

write_wav_on_disk: Number of frames written = 65600.

main:     load time =   324.14 ms
main:     eval time =  8806.57 ms
main:    total time =  9131.68 ms

Here is a video of Bark running on the iPhone:

https://github.com/PABannier/bark.cpp/assets/12958149/bc807c0b-adfa-4c47-a05b-a2d8ba157dd8

Usage

Here are the steps to use Bark.cpp

Get the code

git clone --recursive https://github.com/PABannier/bark.cpp.git
cd bark.cpp
git submodule update --init --recursive

Build

In order to build bark.cpp you must use CMake:

mkdir build
cd build
cmake ..
cmake --build . --config Release

Prepare data & Run

# Install Python dependencies
python3 -m pip install -r requirements.txt

# Download the Bark checkpoints and vocabulary
python3 download_weights.py --out-dir ./models --models bark-small bark

# Convert the model to ggml format
python3 convert.py --dir-model ./models/bark-small --use-f16

# run the inference
./build/examples/main/main -m ./models/bark-small/ggml_weights.bin -p "this is an audio generated by bark.cpp" -t 4

(Optional) Quantize weights

Weights can be quantized using the following strategy: q4_0, q4_1, q5_0, q5_1, q8_0.

Note that to preserve audio quality, we do not quantize the codec model. The bulk of the computation is in the forward pass of the GPT models.

./build/examples/quantize/quantize ./ggml_weights.bin ./ggml_weights_q4.bin q4_0

Seminal papers

Contributing

bark.cpp is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be

Coding guidelines