abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.83k stars 935 forks source link

OSError: exception: access violation reading 0xFFFFFFFFFFFFFFFF #666

Open KeksMember opened 1 year ago

KeksMember commented 1 year ago

I'm trying to implement the low level API into my own program, loading the model(I am using Pygmalion-13B.ggmlv3.Q6_K.gguf) works fine and I get no errors. Now when I try to evaluate the model via llama_cpp.llama_eval I get:

    llama_cpp.llama_eval(self.context, (llama_cpp.c_int * len(embd))(*embd), llama_cpp.c_int(len(embd)), llama_cpp.c_int(0), self.NTHREADS)
File "C:\Users\name\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_cpp\llama_cpp.py", line 788, in llama_eval
    return _lib.llama_eval(ctx, tokens, n_tokens, n_past, n_threads)
OSError: exception: access violation reading 0xFFFFFFFFFFFFFFFF

I've tried type casting the parameters to C integers as seen in the error log snipped above and plain integers too. The self.context is of type llama_cpp.llama_context_p and self.NTHREADS is being retrieved via multiprocessing as seen inside the low level API example in this repository.

My only guess is it's a problem with the model itself, unfortunately I don't have another gguf model at hand which could be used to test this theory.

KeksMember commented 1 year ago

Update: Tried using a different model, yarn-llama-2-13b-128k.Q6_K.gguf, and the error persists.