First attempts - Githubissues

Green-Sky commented 11 months ago

So I have been following this project with anticipation, and finally decided to give it a go.

simple but obvious, the cmake is missing the main :)
the vocab.bin ships with the repo, so why require it for the conversion (i commented it out)
running main yields in a allocation error, trying to allocate 47GiB :rofl:

$ ./main -m models/bark_v0/
bark_model_load: loading model from 'models/bark_v0/'
bark_model_load: reading bark text model
gpt_model_load: n_in_vocab  = 129600
gpt_model_load: n_out_vocab = 10048
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1894.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1701.69 MB
bark_model_load: reading bark vocab

bark_model_load: reading bark coarse model
gpt_model_load: n_in_vocab  = 12096
gpt_model_load: n_out_vocab = 12096
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1443.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1250.69 MB

bark_model_load: reading bark fine model
gpt_model_load: n_in_vocab  = 1056
gpt_model_load: n_out_vocab = 1056
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 7
gpt_model_load: n_wtes      = 8
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1411.25 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1218.26 MB

bark_model_load: reading bark codec model
encodec_model_load: model size    =   44.32 MB

bark_model_load: total model size  =  4170.64 MB

bark_generate_audio: prompt: 'this is an audio'
bark_generate_audio: number of tokens in prompt = 513, first 8 tokens: 20579 20172 20199 33733 129595 129595 129595 129595
bark_forward_text_encoder: ...........................................................................................................

bark_forward_text_encoder: mem per token =     4.80 MB
bark_forward_text_encoder:   sample time =    17.30 ms
bark_forward_text_encoder:  predict time =  6746.21 ms / 18.48 ms per token
bark_forward_text_encoder:    total time =  6825.61 ms

bark_forward_coarse_encoder: ...................................................................................................................................................................................................................................................................................................................................

bark_forward_coarse_encoder: mem per token =     8.51 MB
bark_forward_coarse_encoder:   sample time =     4.79 ms
bark_forward_coarse_encoder:  predict time = 30730.57 ms / 94.85 ms per token
bark_forward_coarse_encoder:    total time = 30784.73 ms

fine_gpt_eval: failed to allocate 50200313856 bytes
bark_forward_fine_encoder: ggml_aligned_malloc: insufficient memory (attempted to allocate 47874.75 MB)
GGML_ASSERT: ggml.c:4408: ctx->mem_buffer != NULL
Aborted (core dumped)

PABannier commented 11 months ago

Hello @Green-Sky ! Thanks for your interest in bark.cpp and for reporting these bugs. This is very valuable.

Thanks for catching that. Could you submit a PR to fix this?
You're right. vocab.bin is in the repo so that the test-tokenizer script can run.
Ouch... I have never faced such a bug in all the experiments I made. In any case, it's interesting to see that this excessive memory allocation occurs for the fine encoder, as it is the one which does not pass a unit test I wrote to compare the output against the original bark implementation. Are you able to track down which node in the computational graph is responsible for this excessive memory allocation?

Green-Sky commented 11 months ago

sure will do, will orient myself on ggml/llama.cpp
actually, we might need to be careful here, in case we have different vocab/tokenizers. I know llama.cpp optionally lets you define a vocab, but uses the default by default (see llama.cpp convert.py)
ehh. maybe? did not work with ggml ml specific code yet, would be nice to be given some pointers here.

Green-Sky commented 11 months ago

ran it in the debugger:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7a52859 in __GI_abort () at abort.c:79
#2  0x00005555555a3f4c in ggml_init (params=...) at /home/green/workspace/bark.cpp/ggml.c:4410
#3  0x0000555555565422 in fine_gpt_eval (model=..., n_threads=4, codebook_ix=3,
    embd_inp=std::vector of length 8, capacity 8 = {...},
    logits=std::vector of length 2, capacity 2 = {...}, mem_per_token=@0x7fffffffba28: 5106640)
    at /home/green/workspace/bark.cpp/bark.cpp:555
#4  0x000055555556fbb9 in bark_forward_fine_encoder (
    tokens=std::vector of length 162, capacity 256 = {...}, model=..., rng=..., n_threads=4, temp=0.5)
    at /home/green/workspace/bark.cpp/bark.cpp:1494
#5  0x0000555555571f8e in bark_generate_audio (model=..., vocab=..., text=<optimized out>, n_threads=4)
    at /usr/include/c++/9/bits/stl_tree.h:129
#6  0x000055555555c6e3 in main (argc=<optimized out>, argv=<optimized out>)
    at /usr/include/c++/9/bits/basic_string.h:2316

PABannier commented 3 months ago

Closing as this should be fixed with #139 . Feel free to share your feedbacks @Green-Sky in another issue :)

PABannier / bark.cpp

First attempts #67