abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.11k stars 964 forks source link

llama.py", line 1508, in __del__ TypeError: 'NoneType' object is not callable #580

Closed phamkhactu closed 1 year ago

phamkhactu commented 1 year ago

I've followed tutorial for using gpu for llama 2

CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

After that I code some code as tutorial for running GPU:

from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

template = """Question: {question}

Answer: Let's work this out in a step by step way to be sure we have the right answer."""

prompt = PromptTemplate(template=template, input_variables=["question"])

callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

n_gpu_layers = 40  # Change this value based on your model and your GPU VRAM pool.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="llama-2-7b-chat/7B/ggml-model-q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,
)

llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "What NFL team won the Super Bowl in the year Justin Bieber was born?"

llm_chain.run(question)

I get error:

/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/deeplake/util/check_latest_version.py:32: UserWarning: A newer version of deeplake (3.6.17) is available. It's recommended that you update to the latest version using `pip install -U deeplake`.
  warnings.warn(
ggml_init_cublas: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
  Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6
llama.cpp: loading model from llama-2-7b-chat/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_head_kv  = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 1.0e-06
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 3090) as main device
llama_model_load_internal: mem required  =  372.40 MB (+  256.00 MB per state)
llama_model_load_internal: allocating batch_size x (512 kB + n_ctx x 128 B) = 288 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 32 repeating layers to GPU
llama_model_load_internal: offloading non-repeating layers to GPU
llama_model_load_internal: offloading v cache to GPU
llama_model_load_internal: offloading k cache to GPU
llama_model_load_internal: offloaded 35/35 layers to GPU
llama_model_load_internal: total VRAM used: 4122 MB
llama_new_context_with_model: kv self size  =  256.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
 Here are the Super Bowl winners for each year since 1967 (the year Justin Bieber was born):
• 1980 - The Pittsburgh Steelers won Super Bowl XIV.
• 1985 - The San Francisco 49ers won Super Bowl XIX.
• 1988 - The Washington Redskins won Super Bowl XXII.
• 1990 - The Pittsburgh Steelers won Super Bowl XXV.
• 1993 - The Dallas Cowboys won Super Bowl XXVIII.
• 1995 - The Dallas Cowboys won Super Bowl XXX.
• 2000 - The Baltimore Ravens won Super Bowl XXXIV.
• 2004 - The New England Patriots won Super Bowl XXXVIII.
• 2007 - The Indianapolis Colts won Super Bowl XLI.
• 2011 - The Green Bay Packers won Super Bowl XLV.
So, based on this information, the NFL team that won the Super Bowl in the year Justin Bieber was born (1980) is the Pittsburgh Ste
llama_print_timings:        load time =   375.95 ms
llama_print_timings:      sample time =    79.32 ms /   256 runs   (    0.31 ms per token,  3227.43 tokens per second)
llama_print_timings: prompt eval time =   375.90 ms /    45 tokens (    8.35 ms per token,   119.71 tokens per second)
llama_print_timings:        eval time =  5212.53 ms /   255 runs   (   20.44 ms per token,    48.92 tokens per second)
llama_print_timings:       total time =  6069.13 ms
elException ignored in: <function Llama.__del__ at 0x7f0f10c60310>
Traceback (most recent call last):
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/llama_cpp/llama.py", line 1508, in __del__
TypeError: 'NoneType' object is not callable
c0sogi commented 1 year ago

I see nothing wrong in the log, it just looks like ignored error triggered by other library handling deletion of the instance.

logan-markewich commented 1 year ago

Yea I see this a lot. It doesn't stop execution or anything, usually just gets printed on shutdown. A little annoying though

phamkhactu commented 1 year ago

@c0sogi @logan-markewich

oh, Thank you. I think it is a errror.