PABannier / bark.cpp

Suno AI's Bark model in C/C++ for fast text-to-speech
MIT License
633 stars 48 forks source link

Not enough space in the context's memory pool #122

Closed infojunkie closed 2 months ago

infojunkie commented 8 months ago

Following your instructions, I get the following:

$ ./build/bin/main -m ./ggml_weights/ -p "this is an audio"
bark_load_model_from_file: loading model from './ggml_weights/'
bark_load_model_from_file: reading bark text model
gpt_model_load: n_in_vocab  = 129600
gpt_model_load: n_out_vocab = 10048
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 304 bytes
gpt_model_load: ggml ctx size = 1894.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1701.69 MB
bark_load_model_from_file: reading bark vocab

bark_load_model_from_file: reading bark coarse model
gpt_model_load: n_in_vocab  = 12096
gpt_model_load: n_out_vocab = 12096
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 304 bytes
gpt_model_load: ggml ctx size = 1443.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1250.69 MB

bark_load_model_from_file: reading bark fine model
gpt_model_load: n_in_vocab  = 1056
gpt_model_load: n_out_vocab = 1056
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 7
gpt_model_load: n_wtes      = 8
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 304 bytes
gpt_model_load: ggml ctx size = 1411.25 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1218.26 MB

bark_load_model_from_file: reading bark codec model
encodec_model_load: model size    =   44.32 MB

bark_load_model_from_file: total model size  =  4170.64 MB

bark_tokenize_input: prompt: 'this is an audio'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20579 20172 20199 33733 129595 129595 129595 129595 
bark_forward_text_encoder: ...........................................................................................................

bark_print_statistics: mem per token =     4.81 MB
bark_print_statistics:   sample time =    23.58 ms / 109 tokens
bark_print_statistics:  predict time =  9675.77 ms / 87.96 ms per token
bark_print_statistics:    total time =  9702.40 ms

bark_forward_coarse_encoder: ...................................................................................................................................................................................................................................................................................................................................

bark_print_statistics: mem per token =     8.53 MB
bark_print_statistics:   sample time =     6.76 ms / 324 tokens
bark_print_statistics:  predict time = 50832.34 ms / 156.41 ms per token
bark_print_statistics:    total time = 50843.50 ms

ggml_new_object: not enough space in the context's memory pool (needed 4115076720, available 4112941056)
Segmentation fault (core dumped)

Is this related to my machine memory?

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            39Gi       6.3Gi       8.2Gi       1.1Gi        24Gi        28Gi
Swap:           19Gi       0.0Ki        19Gi
PABannier commented 8 months ago

Hi @infojunkie ! Thanks for raising this issue. I don't think it has anything to do with your memory. You have more than enough normally.

I have yet to comply with the new ggml`s memory allocation API. I will do it in the next few days, which should fix your issue. I'll be sure to ping you on this issue, once it's done.

infojunkie commented 8 months ago

I edited the line https://github.com/PABannier/bark.cpp/blob/main/bark.cpp#L1269 to change the factor to 1.5 instead of 1.2, and this arbitrary number "fixed" the fine encoder:

[...]
bark_forward_fine_encoder: .....

bark_print_statistics: mem per token =     0.42 MB
bark_print_statistics:   sample time =   108.56 ms / 6144 tokens
bark_print_statistics:  predict time = 61210.73 ms / 8744.39 ms per token
bark_print_statistics:    total time = 61324.55 ms

GGML_ASSERT: /home/kratib/src/misc/bark.cpp/ggml/src/ggml.c:13701: false
Aborted (core dumped)

Now the app crashes at:

static void ggml_compute_forward_conv_1d_stage_0(
        const struct ggml_compute_params * params,
        const struct ggml_tensor * src0,
        const struct ggml_tensor * src1,
              struct ggml_tensor * dst) {
    switch(src0->type) {
        case GGML_TYPE_F16:
            {
                ggml_compute_forward_conv_1d_stage_0_f32(params, src0, src1, dst);
            } break;
        default:
            {
                GGML_ASSERT(false);
            } break;
    }
}

It would seem the function is called with GGML_TYPE_F32 tensors. Not sure how to proceed.

keldenl commented 7 months ago

@PABannier is there a workaround for this? i'm running into the same issue as well

lin72h commented 7 months ago

Same problem here

groovybits commented 7 months ago

Also having this issue.

groovybits commented 7 months ago

I'm able to build and run when I switch to this commit:

git checkout 07e651618b3a8a27de3bfa7f733cdb0aa8f46b8a

I also switched the ggml/ checkout to master branch too, not sure if that matters though or not.

Posting to hopefully help figure out what is wrong. I will look closer at it but am not familiar with the code so not sure how much I can help explain why this works.

PABannier commented 7 months ago

Hi @groovybits @lin72h ! Thanks for taking the time to find a partial fix. With the upcoming #124 this should be fixed as the memory estimation is delegated to ggml-alloc, instead of my hacky estimate. I'll try to have it merged asap.

bachittle commented 6 months ago

@groovybits I tried switching the commit and does not run, builds fail. It looks like ggml/ is using a custom fork done by @PABannier with some changes to get it to work, REFLEC pieces added. So unsure of as to how you got that commit to work.

I wouldn't mind if a hacky solution was implemented in the meantime, as long as the master branch or some branch has a working interface I can get started with. If anything needs to be implemented I also am willing to make a PR.

Best of luck to the refactor @PABannier, this is a massive undertaking and will be very impressive if everything starts working!

PABannier commented 2 months ago

Hello All,

This should now be fixed with #139 . Could you give it a try?

infojunkie commented 2 months ago

Thanks, I was able to generate an audio with the latest. The instructions in this repo are not up to date, so here's what I did:

PABannier commented 2 months ago

Thanks for the detailed instructions @infojunkie !

In #151 , I greatly simplified the instructions. They should be more straight forward now. We also support bark-small which is a smaller version of Bark developed by Suno, which should not cause memory issues.