ggerganov / ggml

Tensor library for machine learning
MIT License
11.26k stars 1.05k forks source link

How to free cgraph ? #989

Closed EeyoreLee closed 1 month ago

EeyoreLee commented 1 month ago

When I hit bert_build_dynamic more, the gf cost more memory.

How to reproduce? modify predict_logits in a loop, like

    for (int i = 0; i < 30; ++i)
    {
        py_bert_batch_predict_logits(ctx, sentences, n_sentences, n_threads, logits);
    }

the used_mem like

start ggml_used_mem: 0
end ggml_used_mem: 47871728
start ggml_used_mem: 0
end ggml_used_mem: 76783184
start ggml_used_mem: 0
end ggml_used_mem: 109081520
start ggml_used_mem: 0
end ggml_used_mem: 144766736
start ggml_used_mem: 0
end ggml_used_mem: 183838832
start ggml_used_mem: 0
end ggml_used_mem: 226297808
start ggml_used_mem: 0
end ggml_used_mem: 272143664
start ggml_used_mem: 0
end ggml_used_mem: 321376400
EeyoreLee commented 1 month ago

The reason is my batch_tokenizer use a static vector but not clear everytime. For the graph, just ggml_free context works fine. :)

JohannesGaessler commented 1 month ago

The memory for ggml_cgraph is allocated in the corresponding ggml_context. It is not possible to free this memory without freeing the memory for the entire context.