PABannier / bark.cpp

Suno AI's Bark model in C/C++ for fast text-to-speech
MIT License
635 stars 49 forks source link

First attempts #67

Closed Green-Sky closed 3 months ago

Green-Sky commented 11 months ago

So I have been following this project with anticipation, and finally decided to give it a go.

  1. simple but obvious, the cmake is missing the main :)
  2. the vocab.bin ships with the repo, so why require it for the conversion (i commented it out)
  3. running main yields in a allocation error, trying to allocate 47GiB :rofl:
$ ./main -m models/bark_v0/
bark_model_load: loading model from 'models/bark_v0/'
bark_model_load: reading bark text model
gpt_model_load: n_in_vocab  = 129600
gpt_model_load: n_out_vocab = 10048
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1894.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1701.69 MB
bark_model_load: reading bark vocab

bark_model_load: reading bark coarse model
gpt_model_load: n_in_vocab  = 12096
gpt_model_load: n_out_vocab = 12096
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1443.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1250.69 MB

bark_model_load: reading bark fine model
gpt_model_load: n_in_vocab  = 1056
gpt_model_load: n_out_vocab = 1056
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 7
gpt_model_load: n_wtes      = 8
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1411.25 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1218.26 MB

bark_model_load: reading bark codec model
encodec_model_load: model size    =   44.32 MB

bark_model_load: total model size  =  4170.64 MB

bark_generate_audio: prompt: 'this is an audio'
bark_generate_audio: number of tokens in prompt = 513, first 8 tokens: 20579 20172 20199 33733 129595 129595 129595 129595
bark_forward_text_encoder: ...........................................................................................................

bark_forward_text_encoder: mem per token =     4.80 MB
bark_forward_text_encoder:   sample time =    17.30 ms
bark_forward_text_encoder:  predict time =  6746.21 ms / 18.48 ms per token
bark_forward_text_encoder:    total time =  6825.61 ms

bark_forward_coarse_encoder: ...................................................................................................................................................................................................................................................................................................................................

bark_forward_coarse_encoder: mem per token =     8.51 MB
bark_forward_coarse_encoder:   sample time =     4.79 ms
bark_forward_coarse_encoder:  predict time = 30730.57 ms / 94.85 ms per token
bark_forward_coarse_encoder:    total time = 30784.73 ms

fine_gpt_eval: failed to allocate 50200313856 bytes
bark_forward_fine_encoder: ggml_aligned_malloc: insufficient memory (attempted to allocate 47874.75 MB)
GGML_ASSERT: ggml.c:4408: ctx->mem_buffer != NULL
Aborted (core dumped)
PABannier commented 11 months ago

Hello @Green-Sky ! Thanks for your interest in bark.cpp and for reporting these bugs. This is very valuable.

  1. Thanks for catching that. Could you submit a PR to fix this?
  2. You're right. vocab.bin is in the repo so that the test-tokenizer script can run.
  3. Ouch... I have never faced such a bug in all the experiments I made. In any case, it's interesting to see that this excessive memory allocation occurs for the fine encoder, as it is the one which does not pass a unit test I wrote to compare the output against the original bark implementation. Are you able to track down which node in the computational graph is responsible for this excessive memory allocation?
Green-Sky commented 11 months ago
  1. sure will do, will orient myself on ggml/llama.cpp
  2. actually, we might need to be careful here, in case we have different vocab/tokenizers. I know llama.cpp optionally lets you define a vocab, but uses the default by default (see llama.cpp convert.py)
  3. ehh. maybe? did not work with ggml ml specific code yet, would be nice to be given some pointers here.
Green-Sky commented 11 months ago

ran it in the debugger:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7a52859 in __GI_abort () at abort.c:79
#2  0x00005555555a3f4c in ggml_init (params=...) at /home/green/workspace/bark.cpp/ggml.c:4410
#3  0x0000555555565422 in fine_gpt_eval (model=..., n_threads=4, codebook_ix=3,
    embd_inp=std::vector of length 8, capacity 8 = {...},
    logits=std::vector of length 2, capacity 2 = {...}, mem_per_token=@0x7fffffffba28: 5106640)
    at /home/green/workspace/bark.cpp/bark.cpp:555
#4  0x000055555556fbb9 in bark_forward_fine_encoder (
    tokens=std::vector of length 162, capacity 256 = {...}, model=..., rng=..., n_threads=4, temp=0.5)
    at /home/green/workspace/bark.cpp/bark.cpp:1494
#5  0x0000555555571f8e in bark_generate_audio (model=..., vocab=..., text=<optimized out>, n_threads=4)
    at /usr/include/c++/9/bits/stl_tree.h:129
#6  0x000055555555c6e3 in main (argc=<optimized out>, argv=<optimized out>)
    at /usr/include/c++/9/bits/basic_string.h:2316
PABannier commented 3 months ago

Closing as this should be fixed with #139 . Feel free to share your feedbacks @Green-Sky in another issue :)