abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.14k stars 968 forks source link

destructor llama error: TypeError: 'NoneType' object is not callable #1610

Open yanjunting1983 opened 4 months ago

yanjunting1983 commented 4 months ago

Expected Behavior

destructor llama normally

Current Behavior

source code:

import bs4 from langchain import hub from langchain_community.document_loaders import WebBaseLoader from langchain_chroma import Chroma from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_community.llms import LlamaCpp from langchain.embeddings import HuggingFaceBgeEmbeddings

n_gpu_layers = 0 n_batch = 512 _model_path = "/Users/zhouquanquan/software/llama.cpp/models/ggml-meta-llama-3-8b-Q4_K_M.gguf"

llm = LlamaCpp( model_path=_model_path, n_gpu_layers=n_gpu_layers, n_batch=n_batch, f16_kv=True, temperature=0, top_p=1, n_ctx=8192 )

loader = WebBaseLoader( web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",), bs_kwargs=dict( parseonly=bs4.SoupStrainer( class=("post-content", "post-title", "post-header") ) ), ) docs = loader.load()

model_name = "BAAI/bge-small-en-v1.5" encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity

bge_embeddings = HuggingFaceBgeEmbeddings( model_name=model_name, model_kwargs={'device': 'cpu'}, encode_kwargs=encode_kwargs )

text_splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=0) splits = text_splitter.split_documents(docs) vectorstore = Chroma.from_documents(documents=splits, embedding=bge_embeddings)

retriever = vectorstore.as_retriever() prompt = hub.pull("rlm/rag-prompt-llama3")

def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs)

rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() )

print(rag_chain.invoke("What is Task Decomposition in the article?")) print(rag_chain.invoke("Show me all of the memory types in the article."))

error:

Exception ignored in: <function Llama.del at 0x7f8e1a856ee0> Traceback (most recent call last): File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/site-packages/llama_cpp/llama.py", line 2091, in del File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/site-packages/llama_cpp/llama.py", line 2086, in close File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 540, in close File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 532, in exit File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 517, in exit File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 322, in exit File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/site-packages/llama_cpp/_internals.py", line 66, in close File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 540, in close File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 532, in exit File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 517, in exit File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/contextlib.py", line 405, in _exit_wrapper File "/Users/zhouquanquan/anaconda3/envs/playllm/lib/python3.9/site-packages/llama_cpp/_internals.py", line 60, in free_model TypeError: 'NoneType' object is not callable

Environment and Context

1.4 GHz 四核Intel Core i5 Darwin 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:28:58 PST 2023; root:xnu-10002.81.5~7/RELEASE_X86_64 x86_64 Python 3.9.18 GNU Make 3.81 Apple clang version 15.0.0 (clang-1500.3.9.4) llama_cpp_python 0.2.82

vimalkum commented 3 months ago

Same Issue when we load Llama-2 for inference using Llama-cpp-python. Do you have any update on this?

kaoyuching commented 3 months ago

I had this issue too. Here is the source code that causes this issue

https://github.com/abetlen/llama-cpp-python/blob/f7b9e6d42981bb75106699e0712a6378c39ef92c/llama_cpp/llama.py#L2090-L2094

https://github.com/abetlen/llama-cpp-python/blob/f7b9e6d42981bb75106699e0712a6378c39ef92c/llama_cpp/_internals.py#L57-L69

When normally deleting the model object, say, del model, the llama_cpp.llama_free_model(self.model) is called without error. However the module llama_cpp is destroyed before the model object during the Python interpreter shutdown so the llama_cpp cannot be reached when the Python interpreter calls _LlamaModel.__del__ method.

To avoid this issue, the model has to be closed explicitly before the Python interpreter shutdowns. Here are my solutions:

Using langchain:

import atexit
from langchain_community.llms import LlamaCpp

model = LlamaCpp(...)

@atexit.register
def free_model():
    model.client.close()

Using llama-cpp-python:

import atexit
from llama_cpp import Llama

model = Llama(...)

@atexit.register
def free_model():
    model.close()

I call model.close() in the built-in exit callback register atexit.register, which is executed before the llama_cpp module is deleted because the exit callback takes priority over the destroyer.

PierreCarceller commented 2 months ago

+1

cpfiffer commented 2 months ago

+1

kavsarr commented 4 weeks ago

My solution that works for llama-cpp-python>=0.3.0:

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

model_name = ""
model_file = ""
model_path = hf_hub_download(model_name, filename=model_file)

model = Llama(model_path=model_path, verbose=False)

generation_kwargs = {"max_tokens": 100}

messages = [{"role": "system", "content": "Be a helpful assistant"},
            {"role": "user", "content": "Hi."}]

res = model.create_chat_completion(messages=messages,
                                   **generation_kwargs)

print(res["choices"][0])

model._sampler.close()
model.close()

We need to explicitly deconstruct the sampler and the model.