Open Shyryp opened 1 month ago
working on that right now. i managed to do it for transformers models. i will make it possible for llama.cpp models also. there will be an option to keep in memory or unload it soon.
there is now 2 new nodes for llama.cpp
LLava Optional Memory Free Simple LLava Optional Memory Free Advanced
You can use multiple of this nodes in your workflow.
Perfect! Thank you! Works great!
The only thing is that I would like to have a node option (or an image input parameter option) without the mandatory submission of an image as an input, if the user only plans to generate text on prompt.
Since in the current version of the node, I am forced, firstly, to specify an image, which can spoil the generation result, and secondly, I am forced to specify a much larger max_ctx size (at the moment I now need to specify a value twice as large for my tasks (4096) as before without necessarily adding an image), increasing the load/generation time.
Is it possible to make the "image" input parameter optional? Or tell me how I can modify the code for the LLava Optional Memory Free Advanced node to achieve this behavior?
Thanks for your hard work and help!
You mean, you want to use it as an llm. I can add LLM memory free version of this node also.
Yes, like an LLM, that's right. It will be very cool to have such a node. Thanks again!
there is already llm loader and llm sampler and you can use llava models as llm on them. they are currently not supporting unloading option but i will add llm versions of them also.
BUG: Memory leak issue when using node LLava Optional Memory Free.
LLava Optional Memory Free does not unload the CLIP model from memory. At the same time, with each launch of the queue, the CLIP model is loaded into additional memory in VRAM, rather than overwriting the previously loaded one (does not use the previously loaded one) into memory. That is, after several launches I get video memory overflow and, accordingly, a drop in performance.
You can track this by the VRAM volumes during generation.
I hope there is a solution for this!
LLM Optional Memory Free would be extremely useful for many tasks, I’ll be waiting, thanks for your work! 💪
Problem: Currently, the model loaded into memory (VRAM) remains in it until the end of generation, even if I generate other content (pictures, for example) after generating the text.
Question: Is it possible to unload from memory a model that is no longer used in the next nodes of the workflow? Is there some kind of node that allows you to controllably unload a model previously loaded into memory?
If this is not possible now, can you create such a node or suggest how you can remove unnecessary models from memory using code?
Thanks for info!