Open yanjunting1983 opened 4 months ago
Same Issue when we load Llama-2 for inference using Llama-cpp-python. Do you have any update on this?
I had this issue too. Here is the source code that causes this issue
When normally deleting the model object, say, del model
, the llama_cpp.llama_free_model(self.model)
is called without error. However the module llama_cpp
is destroyed before the model object during the Python interpreter shutdown so the llama_cpp
cannot be reached when the Python interpreter calls _LlamaModel.__del__
method.
To avoid this issue, the model has to be closed explicitly before the Python interpreter shutdowns. Here are my solutions:
Using langchain:
import atexit
from langchain_community.llms import LlamaCpp
model = LlamaCpp(...)
@atexit.register
def free_model():
model.client.close()
Using llama-cpp-python:
import atexit
from llama_cpp import Llama
model = Llama(...)
@atexit.register
def free_model():
model.close()
I call model.close()
in the built-in exit callback register atexit.register
, which is executed before the llama_cpp
module is deleted because the exit callback takes priority over the destroyer.
+1
+1
My solution that works for llama-cpp-python>=0.3.0:
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
model_name = ""
model_file = ""
model_path = hf_hub_download(model_name, filename=model_file)
model = Llama(model_path=model_path, verbose=False)
generation_kwargs = {"max_tokens": 100}
messages = [{"role": "system", "content": "Be a helpful assistant"},
{"role": "user", "content": "Hi."}]
res = model.create_chat_completion(messages=messages,
**generation_kwargs)
print(res["choices"][0])
model._sampler.close()
model.close()
We need to explicitly deconstruct the sampler and the model.
Expected Behavior
destructor llama normally
Current Behavior
source code:
import bs4 from langchain import hub from langchain_community.document_loaders import WebBaseLoader from langchain_chroma import Chroma from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_community.llms import LlamaCpp from langchain.embeddings import HuggingFaceBgeEmbeddings
n_gpu_layers = 0 n_batch = 512 _model_path = "/Users/zhouquanquan/software/llama.cpp/models/ggml-meta-llama-3-8b-Q4_K_M.gguf"
llm = LlamaCpp( model_path=_model_path, n_gpu_layers=n_gpu_layers, n_batch=n_batch, f16_kv=True, temperature=0, top_p=1, n_ctx=8192 )
loader = WebBaseLoader( web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",), bs_kwargs=dict( parseonly=bs4.SoupStrainer( class=("post-content", "post-title", "post-header") ) ), ) docs = loader.load()
model_name = "BAAI/bge-small-en-v1.5" encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity
bge_embeddings = HuggingFaceBgeEmbeddings( model_name=model_name, model_kwargs={'device': 'cpu'}, encode_kwargs=encode_kwargs )
text_splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=0) splits = text_splitter.split_documents(docs) vectorstore = Chroma.from_documents(documents=splits, embedding=bge_embeddings)
retriever = vectorstore.as_retriever() prompt = hub.pull("rlm/rag-prompt-llama3")
def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs)
rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() )
print(rag_chain.invoke("What is Task Decomposition in the article?")) print(rag_chain.invoke("Show me all of the memory types in the article."))
error:
Exception ignored in: <function Llama.del at 0x7f8e1a856ee0> Traceback (most recent call last): File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/site-packages/llama_cpp/llama.py", line 2091, in del File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/site-packages/llama_cpp/llama.py", line 2086, in close File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 540, in close File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 532, in exit File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 517, in exit File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 322, in exit File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/site-packages/llama_cpp/_internals.py", line 66, in close File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 540, in close File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 532, in exit File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 517, in exit File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 405, in _exit_wrapper File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/site-packages/llama_cpp/_internals.py", line 60, in free_model TypeError: 'NoneType' object is not callable
Environment and Context
1.4 GHz 四核Intel Core i5 Darwin 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:28:58 PST 2023; root:xnu-10002.81.5~7/RELEASE_X86_64 x86_64 Python 3.9.18 GNU Make 3.81 Apple clang version 15.0.0 (clang-1500.3.9.4) llama_cpp_python 0.2.82