Closed EeyoreLee closed 1 month ago
The reason is my batch_tokenizer use a static vector but not clear everytime. For the graph, just ggml_free context works fine. :)
The memory for ggml_cgraph
is allocated in the corresponding ggml_context
. It is not possible to free this memory without freeing the memory for the entire context.
When I hit
bert_build_dynamic
more, the gf cost more memory.How to reproduce? modify
predict_logits
in a loop, likethe used_mem like