Open Baquara opened 5 months ago
From what I can see, llama_free_model
is expected to take a lower-level object instead of the Llama
object. In Python, determining when the garbage collector actually deletes an object is not straightforward. Here is a workaround that forces the release of the loaded model:
from llama_cpp import Llama
llama_model = Llama(…)
# Explicitly delete the model's internal object
llama_model._model.__del__()
This approach has worked for me so far.
In my experience, @jkawamoto approach is a good one, because it frees RAM/CUDA/other memory, even if the Llama object is stuck.
I've tried calling del llama_model
, but this is not guaranteed to actually call __del__
if there are references to the object (and this can happen in several cases, like for example from uncaught exceptions in interactive environments like Jupyterlab - see here )
Since calling a special method (__del__
) of a private field is too ad hoc, I opened a PR #1513 that adds a close method to explicitly free the model.
I am running llama_model._model.del() per the above comment, and I am still seeing the process use cuda ram.
Has there been any movement on creating a proper close method?
Llama
class has close
method now, and the following code should free up RAM:
from llama_cpp import Llama
llama_model = Llama(…)
...
llama_model.close()
Thank you!!!
Expected Behavior
From the issue #302 , I expected the model to be unloaded with the following function:
However, there are two problems here:
1 - Using llama_free_model with the object llm (which is conventionally loaded) is resulting in this:
'llm' is generated with this:
2 - Even after deleting the object, assigning as None and invoking the garbage collection, the VRAM is still not freed. The VRAM only gets cleared after I kill the app along all of its processes and threads.
Current Behavior
1- llama_free_model does not work. 2 - Garbage collection not freeing up VRAM.
Environment and Context
I tried this on both an Arch Linux setup with an RTX 3090 and a Windows laptop with an eGPU. This problem was consistent on those two different OSes and different hardware setups.
AMD Ryzen 7 2700 Eight-Core Processor NVIDIA GeForce RTX 3090
Arch Linux 6.8.9-arch1-1 Windows 11
Failure Information (for bugs)
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.