PABannier / bark.cpp

Suno AI's Bark model in C/C++ for fast text-to-speech
MIT License
633 stars 48 forks source link

Unable to build. #131

Closed arthurwolf closed 2 months ago

arthurwolf commented 4 months ago

Git clone submodules recursive fails, but I was able to manually fix that.

╰─(base) ⠠⠵ git submodule update --init --recursive                                                                                                                                        on main|✚1
fatal: remote error: upload-pack: not our ref e50cd96d28c89f6c1343c291042b14bab6f3b83b
fatal: Fetched in submodule path 'encodec.cpp', but it did not contain e50cd96d28c89f6c1343c291042b14bab6f3b83b. Direct fetching of that commit failed.

But then when I do cmake --build . --config Release I get:

╰─(base) ⠠⠵ cmake --build . --config Release                                                                                                                                               on main|✚1
[  4%] Building C object encodec.cpp/ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[  8%] Building C object encodec.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-alloc.c.o
[ 12%] Building C object encodec.cpp/ggml/src/CMakeFiles/ggml.dir/ggml-backend.c.o
[ 16%] Linking C shared library libggml.so
[ 16%] Built target ggml
[ 20%] Building CXX object encodec.cpp/CMakeFiles/encodec.dir/encodec.cpp.o
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:319:22: warning: multi-character character constant [-Wmultichar]
  319 |         if (magic != ENCODEC_FILE_MAGIC) {
      |                      ^~~~~~~~~~~~~~~~~~
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp: In function ‘void print_tensor(ggml_tensor*)’:
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:79:27: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   79 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                        ~~~^                        ~~~~~~~~
      |                           |                               |
      |                           long long int                   int64_t {aka long int}
      |                        %ld
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:79:33: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   79 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                              ~~~^                            ~~~~~~~~
      |                                 |                                   |
      |                                 long long int                       int64_t {aka long int}
      |                              %ld
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:79:39: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 4 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   79 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                                    ~~~^                                ~~~~~~~~
      |                                       |                                       |
      |                                       long long int                           int64_t {aka long int}
      |                                    %ld
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:79:45: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 5 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   79 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                                          ~~~^                                    ~~~~~~~~
      |                                             |                                           |
      |                                             long long int                               int64_t {aka long int}
      |                                          %ld
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp: In function ‘bool encodec_load_model_weights(const std::string&, encodec_model&, int)’:
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:714:89: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 5 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
  714 |                 fprintf(stderr, "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld, %lld], expected [%d, %d, %d]\n",
      |                                                                                      ~~~^
      |                                                                                         |
      |                                                                                         long long int
      |                                                                                      %ld
  715 |                         __func__, name.data(), tensor->ne[0], tensor->ne[1], tensor->ne[2], ne[0], ne[1], ne[2]);
      |                                                ~~~~~~~~~~~~~                             
      |                                                            |
      |                                                            int64_t {aka long int}
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:714:95: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 6 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
  714 |                 fprintf(stderr, "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld, %lld], expected [%d, %d, %d]\n",
      |                                                                                            ~~~^
      |                                                                                               |
      |                                                                                               long long int
      |                                                                                            %ld
  715 |                         __func__, name.data(), tensor->ne[0], tensor->ne[1], tensor->ne[2], ne[0], ne[1], ne[2]);
      |                                                               ~~~~~~~~~~~~~                    
      |                                                                           |
      |                                                                           int64_t {aka long int}
/home/arthur/dev/ai/bark.cpp/encodec.cpp/encodec.cpp:714:101: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 7 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
  714 |                 fprintf(stderr, "%s: tensor '%s' has wrong shape in model file: got [%lld, %lld, %lld], expected [%d, %d, %d]\n",
      |                                                                                                  ~~~^
      |                                                                                                     |
      |                                                                                                     long long int
      |                                                                                                  %ld
  715 |                         __func__, name.data(), tensor->ne[0], tensor->ne[1], tensor->ne[2], ne[0], ne[1], ne[2]);
      |                                                                              ~~~~~~~~~~~~~           
      |                                                                                          |
      |                                                                                          int64_t {aka long int}
[ 25%] Linking CXX static library libencodec.a
[ 25%] Built target encodec
[ 29%] Building CXX object CMakeFiles/bark.dir/bark.cpp.o
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp: In function ‘void print_tensor(ggml_tensor*)’:
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:74:27: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   74 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                        ~~~^                        ~~~~~~~~
      |                           |                               |
      |                           long long int                   int64_t {aka long int}
      |                        %ld
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:74:33: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   74 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                              ~~~^                            ~~~~~~~~
      |                                 |                                   |
      |                                 long long int                       int64_t {aka long int}
      |                              %ld
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:74:39: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 4 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   74 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                                    ~~~^                                ~~~~~~~~
      |                                       |                                       |
      |                                       long long int                           int64_t {aka long int}
      |                                    %ld
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:74:45: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 5 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
   74 |         printf("shape=[%lld, %lld, %lld, %lld]\n", a->ne[0], a->ne[1], a->ne[2], a->ne[3]);
      |                                          ~~~^                                    ~~~~~~~~
      |                                             |                                           |
      |                                             long long int                               int64_t {aka long int}
      |                                          %ld
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp: In function ‘void bark_print_statistics(gpt_model*)’:
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:123:47: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 4 has type ‘int64_t’ {aka ‘long int’} [-Wformat=]
  123 |     printf("%s:   sample time = %8.2f ms / %lld tokens\n", __func__, model->t_sample_us/1000.0f, model->n_sample);
      |                                            ~~~^                                                  ~~~~~~~~~~~~~~~
      |                                               |                                                         |
      |                                               long long int                                             int64_t {aka long int}
      |                                            %ld
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp: In function ‘bool bark_generate_audio(bark_context*, std::string&, std::string&, int, bark_verbosity_level)’:
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:2048:43: error: ‘encodec_verbosity_level’ has not been declared
 2048 |         encodec_model_path, n_gpu_layers, encodec_verbosity_level::LOW);
      |                                           ^~~~~~~~~~~~~~~~~~~~~~~
/home/arthur/dev/ai/bark.cpp/bark/bark.cpp:2060:5: error: ‘encodec_set_sample_rate’ was not declared in this scope
 2060 |     encodec_set_sample_rate(ectx, sample_rate);
      |     ^~~~~~~~~~~~~~~~~~~~~~~
gmake[2]: *** [CMakeFiles/bark.dir/build.make:76: CMakeFiles/bark.dir/bark.cpp.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:256: CMakeFiles/bark.dir/all] Error 2
gmake: *** [Makefile:146: all] Error 2
╭─arthur at aquarelle in ~/dev/ai/bark.cpp/bark/build on main✘✘✘ 24-03-06 - 5:17:37
╰─(base) ⠠⠵ cmake --build . --config Release      

any ideas? thanks!

sefaalper commented 3 months ago

Running into the same error. @PABannier any thoughts?

CoruNethron commented 3 months ago

This commit doesn't exist indeed, probably it just wasn't pushed to upstream? https://github.com/PABannier/encodec.cpp/tree/e50cd96d28c89f6c1343c291042b14bab6f3b83b

There is a branch debug_bark_and_encodec, i've tried to use latest commit from it instead and got somehow runnable bark.cpp, yet it generates barely recognisable speech.

@PABannier , once you have some time for it, please check if some commit should be pushed to encodec.cpp or, probably the reference within bark.cpp. should be updated to point to existing commit

Thank you for both projects btw.

engineer1109 commented 2 months ago

lack commit about encodec_set_sample_rate

PABannier commented 2 months ago

@engineer1109 @CoruNethron @sefaalper @arthurwolf This should be fixed with #139 . Could you give it another try?

arthurwolf commented 2 months ago

@PABannier thanks!

Going through the process now. First (minor) issue:

At the step:

python3 -m pip install -r bark/requirements.txt

At that point I'm inside of build/, there's no bark/.

I'm guessing you meant the requiremenents.txt in / of the project, so I did that:

cd ..
python3 -m pip install -r requirements.txt

Next issue, at:

python3 convert.py --dir-model ./models --out-dir ./ggml_weights/

I get:

╰─(base) ⠠⠵ python3 convert.py --dir-model ./models --out-dir ./ggml_weights/
usage: convert.py [-h] --dir-model DIR_MODEL --vocab-path VOCAB_PATH --out-dir
                  OUT_DIR [--use-f16]
convert.py: error: the following arguments are required: --vocab-path

I tried:

find . -name '*vocab*'

and it returned nothing/found nothing.

CoruNethron commented 2 months ago

@PABannier

Thank you, missing encodec.cpp commit issue is fixed indeed. Very impressive!

@arthurwolf

Regarding steps to make it work:

git clone --recursive https://github.com/PABannier/bark.cpp.git
cd bark.cpp
mkdir build
cd build
cmake ..
cmake --build . --config Release
cd ..
python3 -m pip install -r requirements.txt
python3 download_weights.py --download-dir ./models
wget https://huggingface.co/suno/bark/raw/main/vocab.txt
mv ./vocab.txt ./models/
python3 convert.py --dir-model ./models --vocab-path ./models --out-dir ./ggml_weights/
cd ./encodec.cpp/
python3 convert.py --dir-model ./../models --out-dir ./../ggml_weights/ --use-f16
cd ../
mv ./ggml_weights/ggml-model.bin ./encodec_weights.bin
./build/examples/main/main -m ./ggml_weights/ -p "this is an audio"
play ./output.wav

P.S. Not sure if this is appropriate here, but my son was born while I was writing this post. I'm happy. )

PABannier commented 2 months ago

@CoruNethron I'll add these in the instructions! Congratulations for your newborn!

lin72h commented 2 months ago

@CoruNethron Congratulations!

arthurwolf commented 2 months ago

Regarding steps to make it work:

Thank you so much, I really wish more people would post their working-steps like this in these situations, that's very very nice of you. I'll try them soon.

(also, congratulations !! good luck not sleeping for a while)

PABannier commented 2 months ago

Closing this as the instructions have been updated.