Closed qnixsynapse closed 1 year ago
The console logs with q4_0 weights:
./main -t 1 -m quantized_weights/ -p "this is an audio"
bark_model_load: loading model from 'quantized_weights/'
bark_model_load: reading bark text model
gpt_model_load: n_in_vocab = 129600
gpt_model_load: n_out_vocab = 10048
gpt_model_load: block_size = 1024
gpt_model_load: n_embd = 1024
gpt_model_load: n_head = 16
gpt_model_load: n_layer = 24
gpt_model_load: n_lm_heads = 1
gpt_model_load: n_wtes = 1
gpt_model_load: ftype = 2002
gpt_model_load: qntvr = 2
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 436.08 MB
gpt_model_load: memory size = 192.00 MB, n_mem = 24576
gpt_model_load: model size = 242.90 MB
bark_model_load: reading bark vocab
bark_model_load: reading bark coarse model
gpt_model_load: n_in_vocab = 12096
gpt_model_load: n_out_vocab = 12096
gpt_model_load: block_size = 1024
gpt_model_load: n_embd = 1024
gpt_model_load: n_head = 16
gpt_model_load: n_layer = 24
gpt_model_load: n_lm_heads = 1
gpt_model_load: n_wtes = 1
gpt_model_load: ftype = 2002
gpt_model_load: qntvr = 2
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 372.66 MB
gpt_model_load: memory size = 192.00 MB, n_mem = 24576
gpt_model_load: model size = 179.48 MB
bark_model_load: reading bark fine model
gpt_model_load: n_in_vocab = 1056
gpt_model_load: n_out_vocab = 1056
gpt_model_load: block_size = 1024
gpt_model_load: n_embd = 1024
gpt_model_load: n_head = 16
gpt_model_load: n_layer = 24
gpt_model_load: n_lm_heads = 7
gpt_model_load: n_wtes = 8
gpt_model_load: ftype = 2002
gpt_model_load: qntvr = 2
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 368.07 MB
gpt_model_load: memory size = 192.00 MB, n_mem = 24576
gpt_model_load: model size = 175.08 MB
bark_model_load: reading bark codec model
encodec_model_load: model size = 44.32 MB
bark_model_load: total model size = 597.47 MB
bark_generate_audio: prompt: 'this is an audio'
bark_generate_audio: number of tokens in prompt = 513, first 8 tokens: 20579 20172 20199 33733 129595 129595 129595 129595
bark_forward_text_encoder: .............................................................................................................................................
bark_forward_text_encoder: mem per token = 4.80 MB
bark_forward_text_encoder: sample time = 10.56 ms
bark_forward_text_encoder: predict time = 6200.19 ms / 15.54 ms per token
bark_forward_text_encoder: total time = 6278.56 ms
bark_forward_coarse_encoder: .........................................................................................................................................................................................................................................................................................................................................................................................................................................
bark_forward_coarse_encoder: mem per token = 10.31 MB
bark_forward_coarse_encoder: sample time = 4.15 ms
bark_forward_coarse_encoder: predict time = 46163.62 ms / 108.37 ms per token
bark_forward_coarse_encoder: total time = 46232.75 ms
fine_gpt_eval: failed to allocate 32030667571 bytes
bark_forward_fine_encoder: ggml_aligned_malloc: insufficient memory (attempted to allocate 30546.83 MB)
GGML_ASSERT: ggml.c:4408: ctx->mem_buffer != NULL
zsh: killed ./main -t 1 -m quantized_weights/ -p "this is an audio"
I observed that even with quantized weights the coarse encoder takes some time and then the fine encoder tried to allocate very large memory.
Thanks for spotting this bug @akarshanbiswas !
I think I found my mistake: here I should divide by n_codes
which would divide by a factor of 8 the amount of required memory!
Opening a PR to fix this issue.
Awesome. You are very much welcome!
Closed with https://github.com/PABannier/bark.cpp/pull/99 .
Tested and it works now!!! Thank you so much!
Also a question. Why does the fine encoder allocated memory depends on the sequence length of the input?
Just got oom on my 16 GB PC by just increasing the length of the text input.
@PABannier
As explained before in one of the issues, during a forward pass, the bark_forward_fine_encoder tried to allocate 30GB of memory.
The console log looks something like this:
So far I am unable to track the cause. But will keep trying.