How can I unload a model from memory?

gokayfem / ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

Apache License 2.0

297 stars 23 forks source link

How can I unload a model from memory? #90

Open Shyryp opened 1 month ago

Shyryp commented 1 month ago

Problem: Currently, the model loaded into memory (VRAM) remains in it until the end of generation, even if I generate other content (pictures, for example) after generating the text.

Question: Is it possible to unload from memory a model that is no longer used in the next nodes of the workflow? Is there some kind of node that allows you to controllably unload a model previously loaded into memory?

If this is not possible now, can you create such a node or suggest how you can remove unnecessary models from memory using code?

Thanks for info!

gokayfem commented 1 month ago

working on that right now. i managed to do it for transformers models. i will make it possible for llama.cpp models also. there will be an option to keep in memory or unload it soon.

gokayfem commented 1 month ago

there is now 2 new nodes for llama.cpp

LLava Optional Memory Free Simple LLava Optional Memory Free Advanced

You can use multiple of this nodes in your workflow.

Shyryp commented 1 month ago

Perfect! Thank you! Works great!

The only thing is that I would like to have a node option (or an image input parameter option) without the mandatory submission of an image as an input, if the user only plans to generate text on prompt.

Since in the current version of the node, I am forced, firstly, to specify an image, which can spoil the generation result, and secondly, I am forced to specify a much larger max_ctx size (at the moment I now need to specify a value twice as large for my tasks (4096) as before without necessarily adding an image), increasing the load/generation time.

Is it possible to make the "image" input parameter optional? Or tell me how I can modify the code for the LLava Optional Memory Free Advanced node to achieve this behavior?

Thanks for your hard work and help!

gokayfem commented 1 month ago

You mean, you want to use it as an llm. I can add LLM memory free version of this node also.

Shyryp commented 1 month ago

Yes, like an LLM, that's right. It will be very cool to have such a node. Thanks again!

gokayfem commented 1 month ago

there is already llm loader and llm sampler and you can use llava models as llm on them. they are currently not supporting unloading option but i will add llm versions of them also.

Shyryp commented 1 month ago

BUG: Memory leak issue when using node LLava Optional Memory Free.

LLava Optional Memory Free does not unload the CLIP model from memory. At the same time, with each launch of the queue, the CLIP model is loaded into additional memory in VRAM, rather than overwriting the previously loaded one (does not use the previously loaded one) into memory. That is, after several launches I get video memory overflow and, accordingly, a drop in performance.

You can track this by the VRAM volumes during generation.

I hope there is a solution for this!

LLM Optional Memory Free would be extremely useful for many tasks, I’ll be waiting, thanks for your work! 💪