Open ChengjieLi28 opened 4 months ago
Update: when I use prebuilt v0.2.82
, still got null pointer access error...
Update: @abetlen Hi, I have figured out the cause of the null pointer access
issue. If you load model with embedding=True
option:
llm = Llama(model_path=<model_path>,verbose=True, embedding=True)
Then you do a generate task, the null pointer access
error happens.
Is this parameter still supported? If it is no longer supported, please remove it from the documentation: https://llama-cpp-python.readthedocs.io/en/latest/#embeddings. Nowadays, few people use LLM for embedding tasks.
I believe I am running into this error as well on v0.2.82
:
2024-07-12 19:13:01.785 | ERROR | __main__:main:175 - NULL pointer access
Traceback (most recent call last):
File "/app/main.py", line 547, in <module>
main()
└ <function main at 0xfffd086cb910>
> File "/app/main.py", line 168, in main
inf = infer_queue(qargs)
│ └ <__main__.QueueArgs object at 0xfffc2fc1f640>
└ <function infer_queue at 0xfffc2fc37d90>
File "/app/main.py", line 344, in infer_queue
llm_output = llm(
└ <llama_cpp.llama.Llama object at 0xfffc2fc1f340>
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 1674, in __call__
return self.create_completion(
│ └ <function Llama.create_completion at 0xfffc305c0430>
└ <llama_cpp.llama.Llama object at 0xfffc2fc1f340>
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 1607, in create_completion
completion: Completion = next(completion_or_chunks) # type: ignore
└ <generator object Llama._create_completion at 0xfffc2fc12570>
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 1132, in _create_completion
for token in self.generate(
│ └ <function Llama.generate at 0xfffc305c01f0>
└ <llama_cpp.llama.Llama object at 0xfffc2fc1f340>
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 740, in generate
self.eval(tokens)
│ │ └ [1, 28705, 13, 28789, 28766, 6574, 28766, 28767, 13, 2707, 1504, 6031, 395, 1656, 28725, 3116, 28725, 304, 5307, 28723, 1992,...
│ └ <function Llama.eval at 0xfffc305c00d0>
└ <llama_cpp.llama.Llama object at 0xfffc2fc1f340>
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 590, in eval
logits = np.ctypeslib.as_array(self._ctx.get_logits(), shape=(rows * cols, ))
│ │ │ │ │ │ │ └ 32000
│ │ │ │ │ │ └ 1
│ │ │ │ │ └ <function _LlamaContext.get_logits at 0xfffc305a9cf0>
│ │ │ │ └ <llama_cpp._internals._LlamaContext object at 0xfffc2fc1ecb0>
│ │ │ └ <llama_cpp.llama.Llama object at 0xfffc2fc1f340>
│ │ └ <function as_array at 0xfffc2fc4d750>
│ └ <module 'numpy.ctypeslib' from '/usr/local/lib/python3.10/dist-packages/numpy/ctypeslib.py'>
└ <module 'numpy' from '/usr/local/lib/python3.10/dist-packages/numpy/__init__.py'>
File "/usr/local/lib/python3.10/dist-packages/numpy/ctypeslib.py", line 522, in as_array
obj = ctypes.cast(obj, p_arr_type).contents
│ │ │ └ <class 'numpy.ctypeslib.LP_c_float_Array_32000'>
│ │ └ <llama_cpp.llama_cpp.LP_c_float object at 0xfffc2fc30fc0>
│ └ <function cast at 0xfffd07f90790>
└ <module 'ctypes' from '/usr/lib/python3.10/ctypes/__init__.py'>
ValueError: NULL pointer access
Hey, I was able to pinpoint where this bug was introduced with git bisect.
First commit that introduced it was: https://github.com/abetlen/llama-cpp-python/commit/5beec1a1fd28acf6e32d45a6b990ef6389d288ed
Conveniently, the only changes were to llama.cpp, so I then went to go bisect llama.cpp and was able to find the culprit: https://github.com/ggerganov/llama.cpp/commit/80ea089d771f0c2d97afa8bead80ded412f600d7
I'm a little out of my depth with this sutff, but luckily again the change is only a couple hundred lines.
The parts that look relevant to me are:
Changes in logit handling:
- const bool has_logits = cparams.causal_attn;
- const bool has_embd = cparams.embeddings && (hparams.causal_attn || cparams.pooling_type == LLAMA_POOLING_TYPE_NONE);
+ const bool has_logits = !cparams.embeddings;
+ const bool has_embd = cparams.embeddings && (cparams.pooling_type == LLAMA_POOLING_TYPE_NONE);
New pooling behavior:
+ // add on pooling layer
+ if (lctx.cparams.embeddings) {
+ result = llm.append_pooling(result);
+ }
Changes in result tensor selection:
- res = nullptr; // do not extract logits for embedding models such as BERT
-
- // token or sequence embeddings
- embd = gf->nodes[gf->n_nodes - 1];
-
- GGML_ASSERT(strcmp(embd->name, "result_embd") == 0 || strcmp(embd->name, "result_embd_pooled") == 0);
+ res = nullptr; // do not extract logits for embedding case
+ embd = gf->nodes[gf->n_nodes - 1];
+ if (strcmp(embd->name, "result_embd_pooled") != 0) {
+ embd = gf->nodes[gf->n_nodes - 2];
+ }
+ GGML_ASSERT(strcmp(embd->name, "result_embd_pooled") == 0 && "missing embeddings tensor");
New function to set embedding mode:
+void llama_set_embeddings(struct llama_context * ctx, bool embeddings) {
+ ctx->cparams.embeddings = embeddings;
+}
I don't see llama_set_embeddings being called anywhere in this repo, so I think updating to use that will be involved in the fix. Then potentially checking if logits are available before trying to access them in embedding mode and handling the new pooling behavior might be needed as well.
@abetlen -- I think this is enough for you to likely know exactly what needs to change! Thanks for your effort in maintaining the library!
This bug appears to still be in v0.2.83
as well
I would love to use the embedding feature but I am encountering the same issue :(
+1, I have the same issue while making embeddings for vector databases. Working fine until v0.2.77
Tested on v0.2.90
same error:
Traceback (most recent call last):
File "/app/main.py", line 547, in <module>
main()
File "/app/main.py", line 141, in main
inf = infer(args)
File "/app/main.py", line 453, in infer
for i in llm_output:
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 1216, in _create_completion
for token in self.generate(
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 808, in generate
self.eval(tokens)
File "/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py", line 655, in eval
logits = np.ctypeslib.as_array(
File "/usr/local/lib/python3.10/dist-packages/numpy/ctypeslib.py", line 538, in as_array
obj = ctypes.cast(obj, p_arr_type).contents
ValueError: NULL pointer access
This error is thrown for the zephyr-7b q5 model. It has been thrown between v0.2.79
to v0.2.90
. @abetlen
For the latest version
v0.2.81
:Import error happens:
Error:
File ~/miniconda3/lib/python3.9/ctypes/init.py:374, in CDLL.init(self, name, mode, handle, use_errno, use_last_error, winmode) 373 if handle is None: --> 374 self._handle = _dlopen(self._name, mode) 375 else:
OSError: libc.musl-x86_64.so.1: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last) Cell In[1], line 1 ----> 1 from llama_cpp import Llama
File ~/miniconda3/lib/python3.9/site-packages/llama_cpp/init.py:1 ----> 1 from .llama_cpp import 2 from .llama import 4 version = "0.2.81"
File ~/miniconda3/lib/python3.9/site-packages/llama_cpp/llama_cpp.py:88 85 _lib_base_name = "llama" 87 # Load the library ---> 88 _lib = _load_shared_library(_lib_base_name) 91 # ctypes sane type hint helpers 92 # 93 # - Generic Pointer and Array types (...) 96 # NOTE: Only use these for static type checking not for runtime checks 97 # no good will come of that 99 if TYPE_CHECKING:
File ~/miniconda3/lib/python3.9/site-packages/llama_cpp/llama_cpp.py:77, in _load_shared_library(lib_base_name) 75 return ctypes.CDLL(str(_lib_path), **cdll_args) # type: ignore 76 except Exception as e: ---> 77 raise RuntimeError(f"Failed to load shared library '{_lib_path}': {e}") 79 raise FileNotFoundError( 80 f"Shared library with base name '{lib_base_name}' not found" 81 )
RuntimeError: Failed to load shared library '/home/lichengjie/miniconda3/lib/python3.9/site-packages/llama_cpp/lib/libllama.so': libc.musl-x86_64.so.1: cannot open shared object file: No such file or directory
File "/opt/conda/lib/python3.11/site-packages/llama_cpp/llama.py", line 1132, in _create_completion for token in self.generate( ^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/llama_cpp/llama.py", line 740, in generate self.eval(tokens) File "/opt/conda/lib/python3.11/site-packages/llama_cpp/llama.py", line 590, in eval logits = np.ctypeslib.as_array(self._ctx.get_logits(), shape=(rows * cols, )) ^^^^^^^^^^^^^^^^^ File "/opt/conda/lib/python3.11/site-packages/numpy/ctypeslib.py", line 522, in as_array obj = ctypes.cast(obj, p_arr_type).contents ^^^^^^^^^^^^^^^^^ ValueError: [address=0.0.0.0:37293, pid=88] NULL pointer access